RD Controls
EPICURE Software Release Note 64.5
DISNEY Cluster Power Recovery Reboot Procedure

Deb Baddorf

February 8, 1999

The DISNEY cluster and the front-end nodes should all boot themselves automatically after a power failure.

When Power Is Back

MICKEY, MINNIE, and the monitor node HYDRA begin to boot automatically on power up. The five front end nodes also start to boot automatically: HUEY, DEWEY, LOUIE, WEBBY, and DONALD.

When MICKEY is partway up, GEPETO, JIMNEY, BALIN, BOMBUR, and BOFUR will boot from the boot node MICKEY.

For each of MICKEY, HUEY, DEWEY, LOUIE, WEBBY, and DONALD, finger or ping will show if the node is reachable.

Possible Problems

MICKEY or MINNIE won't boot

Sometimes MICKEY and MINNIE manage to get the timing just exactly wrong, and they cause each other to get stuck. This can be related to SCSI devices.

If MICKEY is unreachable, power cycle him. Leave the SCSI crates powered on. If MICKEY comes up but MINNIE is still unreachable, then power cycle her after you try MICKEY. Both nodes need to be present, but MICKEY must start to boot first. If this still doesn't work, it's probably time to call for help.

A Frontend won't boot

It is also possible for frontend nodes to have SCSI problems, if the VAX comes up before the SCSI devices are ready. We want to try a power cycle. BUT we're not allowed to power cycle just the VAX these days! So, find the node's interlock chassis, and flip the toggle switch to OFF. This will turn off power to the VAX and also to the VME crate. Count to 15 and flip the toggle switch to ON. A reboot should begin again.

The Monitor Node

HYDRA's workstation monitor sits atop HYDRA's computer rack. When it gets a blank, gray workstation screen instead of boot messages, then it is finished rebooting. HYDRA is running the old VWS windowing system. Remotely, one can finger or set-host to HYDRA, or ping or telnet to HYDRA0 (that's a zero). Be careful - the name is not the same for Decnet as for TCP!

HYDRA/HYDRA0 is a slower node than all the nodes it actually monitors. So the monitorees will be up before the monitorer is up.

Crew chiefs and assistants can type SET HOST HYDRA from their personal accounts on any DISNEY node which is already up. Use the account name VCSMONITOR. This account is not set up to allow telnets, since it needs to recognize that you are a trusted source.

Once you've logged into HYDRA, the cursor will be at the bottom of the screen (not inside the windowed area), and the prompt will be the word ``Command:''. Type

Command: VIEW nodename
if you wish to watch a node boot or check its progress. Use the name of the node in place of the word nodename and press carriage return at the end of the line.

You can also type OUTPUT nodename BOOT; or SELECT nodename to get a cursor inside the window. Control-G reverses the latter.

This should be all you need to know about VCS for rebooting, but more information about using the VCS monitor program is available in EPICURE Software Release Note 14.

If HYDRA is not available, and if the core nodes do not boot of their own accord, you can try power cycling them to prompt them to make another attempt to auto-boot.

Keywords: EPICURE, DISNEY, front end, computer, reboot, power fail
Distribution:
normal
RD Site Operations

baddorf@fnal.gov

Security, Privacy, Legal