Fermilab ALARM Protocol

Rich Neswold

July 16, 1999

This document describes the ALARM protocol. Most of this information was obtained from ACNET Design Note No. 39.3. Some of it was found in ACNET Design Note 22.28.

1. General Comments

2. As a Front-End Boots

3. Alarm Receiving Task

4. Alarm Reporting Task

5. Alarm Properties

1. General Comments

The ALARM protocol is transmitted as the data of an ACNET packet. This protocol uses the ACNET data representation.
Analog alarms compare the value returned by the READING property with alarm limits (either nominal with tolerance or maximum and minimum.)
Digital alarms compare the value returned by the BASIC STATUS property with alarm limits (a nominal value with a possible mask.)
The front-end process, which handles alarm downloads and reporting, connects to ACNET with the Rad-50 handles ALARMR and SLAM.
There are three types of messages generated by the alarm system:
1. An event message which indicates an event of interest has occurred.
2. An exception message for devices going into the alarming state.
3. An exception message for devices leaving the alarm state.
The ALARMR task communicates with AEOLUS, a task that resides on OPER.

2. As a Front-End Boots

When a front-end boots (or reboots) the list of alarms associated with it need to be cleared from the consoles and AEOLUS. As it initializes its alarms tasks, it sends a reboot notification request to AEOLUS (typecode FEBT.) The reply to this message is delayed until AEOLUS has notified all the consoles and has cleared its own references.

In the current implementation of MOOC, the front-end sends the FEBT message whenever it creates the connection to AEOLUS. This occurs when the front-end boots and each time communications with AEOLUS returns an error. The reply to the FEBT message doesn't appear to have an important format; MOOC only checks the status.

A front-end may be known under several logical nodes. This happens when AEOLUS groups a subset of a front-end's devices under a new node. If the front-end needs to support multiple logical nodes, it must send a FEBT request for each logical node it supports.

Table 1: Format of Front-end Boot Message (found in alarmr.h)

Field Name	Field Type	Field Description
typecod	unsigned char	FEBT typecode (currently equal to 9.)
unused	unsigned char
mibsn	unsigned char	The minimum basic subsystem number to send BIGCs to.
mabsn	unsigned char	The maximum basic subsystem number to send BIGCs to.
node	unsigned char	Node address of the front-end.
trunk	unsigned char	Trunk of the front-end.

The minimum and maximum basic subsystem numbers refer to the range of subsystems that should receive a BIGC (i.e. ``big clear'' -- these are described in Alarm Receiving Task.) These numbers are front-end specific. Only 8 subsystems are supported (0-7).

The last thing a front-end must do, when it boots, is to download the alarm blocks for its devices. The alarm blocks are sent to the front-end through the SETDAT protocol (see Alarm Properties.) The task responsible for downloading the alarm blocks varies. These tasks append messages to a log file to indicate success or failure. These log files can be found in OP$USR1:[VTEVATRON.LOG]. Since this is not part of the official ALARMS protocol and, in fact, is done a little differently on each front-end, we won't go into this any further.

3. Alarm Receiving Task

A front-end needs to create a task called ALARMR to receive requests from AEOLUS. Currently, the only request that is sent to this task is a ``big clear'' (BIGC) message. A BIGC request contains a field which indicates which subsystem should have all its alarms marked ``good''. Intelligent modules will re-report their alarm status. Dumb modules will be put back into alarm when the front-end gets around to checking their value. In either case, the alarm status will eventually be reported to AEOLUS. Subsystems are front-end specific so the subsystem indicator is front-end specific.

Table 2: Format of Big Clear Message (found in alarmr.h)

Field Name	Field Type	Field Description
mtype	unsigned char	BIGC typecode (currently equal to 2.)
unused	unsigned char
unused	unsigned char
subs	unsigned char	The subsystem that needs to be cleared.

The BIGC messages can arrive as requests or USMs, so the ALARMR task should be prepared to handle both.

4. Alarm Reporting Task

The front-ends are responsible for reporting event and exception messages to AEOLUS. A task is set up to do this forwarding. This task periodically scans the devices to see if any have entered an alarm state.

Note:

Charlie has requested that we allow the alarm scan rate to be modifiable by the user. He also thinks that, if it's possible, it would be useful to have a variable scan rate on each device, instead of a single, global rate.

As messages are generated, they should be queued up and sent to AEOLUS no faster than once a second. Sending a ``queue overflow'' exception is encouraged to indicate that messages may have been lost due to queue size limitations.

The alarm task should constantly monitor its connection to AEOLUS. If the connection is lost and then restored, the task should send any new messages that may have queued up during the disconnection.

The connection to AEOLUS can be done through network requests (single requests) or USMs. Network requests are preferred since they provide an acknowledgement from the receiver -- again, the format of the reply isn't important. Whether a reply was received is important.

The alarm task sends an ``event report message'' (ERM) to AEOLUS. The format of this message is shown in ERM Format (found in alarmr.h).

Table 3: ERM Format (found in alarmr.h)

Field Name	Field Type	Field Description
typecod	char	ERM typecode (Either 1 or 14.)
nofp	char	Number of ERP packets included in the message.
data	short[267]	Holds the ``event report packets'' (ERP). Since ERPs are variable lengthed, this unstructured region is used to hold them.

The task fills the message with event report packets (ERPs) to allow multiple events to be reported in one network transaction. The ERM typecode indicates the type ERPs being sent. For typecode 1 the ERPs identify the device with an EMC. For typecode 14 the ERPs identify the device by a DIEMC. Devices which report alarms by DIEMC do not need an EMC in the database because a DIEMC contains the devices di. ERPs have the following format:

Table 4: ERP Format (found in alarmr.h)

Field Name	Field Type	Field Description
length	unsigned char	The length of the ERP.
sos	unsigned char	If set to 1, it indicates ``status-of-status'' in the reading field.
esw	ALARM_FLAGS
emc/diemc	EMC/DIEMC	EMC for typecode 1, DIEMC for typecode 14
rdg	union	This field holds the current reading. It is a union of signed and unsigned integers and a float.
par	unsigned char[16]	Extra parameters. These are parameters that get displayed in the alarm text string. When AEOLUS displays an alarm message, it gets the text from the device's alarm text property. This text string can contain formatting specifiers (similar, but not compatible with, `printf`'s formatting string). The values in this field are used by the formatting string.

DIEMCs have the following format:

Table 5: DIEMC Format (found in alarmr.h)

Field Name	Field Type	Field Description
trunk	unsigned char	The trunk the device's FE is on.
node	unsigned char	The node the device's FE is on.
unused	unsigned short	Contain subsystem mask unused by MOOC FEs.
di	unsigned int	Device's di(device index).

5. Alarm Properties

The final piece of the puzzle is that the front-end needs to be able to support the setting and reading of alarm parameters. It does this via alarm properties. The front-end will receive requests to set or read the alarm block through SETDAT or RETDAT, respectively.

The data packet describing alarm parameters is 20 bytes for both analog and digital alarms. These packets have the following formats:

Table 6: Analog Alarm Block Format

Field Name	Field Type	Field Description
flags	unsigned short	This field holds various flags. The assignment of each bit is shown in Alarm Block Flag Bits.
minval	unsigned long	The minimum value the analog device can reach without triggering an alarm.
maxval	unsigned long	The maximum value the analog device can reach without triggering an alarm.
tneeded	unsigned char	This field acts like a simple filter. This represents the number of consecutive samples that have to exceed the limits before the alarm is generated.
tnow	unsigned char	This contains the current number of times that the device exceeded the limits.
ev1	unsigned char	MSbyte of an FTD if not using the default 3 second rate. Can be event or frequency.
ev2	unsigned char	LSbyte of the FTD.
ssinfo	unsigned short	16-bit offset into an array device.
(unused)	unsigned char	Subsystem-specific information.
alarm type	unsigned char	This field indicates the data type used for alarm comparisons.
(unused)	unsigned char[2]	Subsystem-specific information.

Table 7: Digital Alarm Block Format

Field Name	Field Type	Field Description
flags	unsigned short	This field holds various flags. The assignment of each bit is shown in Alarm Block Flag Bits.
nominal	unsigned long	The expected value of the digital device. If its value differs, an alarm is generated.
mask	unsigned long	The value of the device is logically ANDed with this mask. The result is compared to the nominal field.
tneeded	unsigned char	This field acts like a simple filter. This represents the number of consecutive samples that have to differ from the nominal before an alarm is generated.
tnow	unsigned char	This contains the current number of times that the device differed from the nominal.
ev1	unsigned char	MSbyte of an FTD if not using the default 3 second rate. Can be event or frequency.
ev2	unsigned char	LSbyte of the FTD.
ssinfo	unsigned short	16-bit offset into an array device.
(unused)	unsigned char[4]	Subsystem-specific information.

Both alarm block structures have a common flags field. The bits in this field are defined as:

Table 8: Alarm Block Flag Bits

Bit Name	Bit Description
DE	Display Event. This alarm block represents an event that should be displayed by the consoles.
LE	Log Event. This alarm block represents an event that should be logged.
EV	Event Bit. If this bit is set, then the alarm block describes an event. Otherwise the alarm block describes an exception condition.
HI	Too High. Set if the reading exceeded the maximum limit.
LO	Too Low. Set if the reading exceeded the minimum limit.
K0,K1,K2	Limit Type. These three bits determine the way the alarm limit fields are to be interpreted. 0 means a nominal/tolerance configuration is used. 1 means nominal/percentage and 2 means max/min. Other values are undefined.
AD	Analog/Digital. This indicates whether the alarm is analog or digital.
Q0,Q1	Limit Length. These two fields indicate the length of the values used for nominal and tolerance limits. A value of 3 is undefined.
AI	Abort Inhibit. Inhibits the effect of the abort bit (AB).
AB	Abort Bit. If set, then the beam will be aborted when an alarm occurs.
GB	Good/Bad. If the alarm indicates a ``going good'' state, then this bit is 0. A ``going bad'' state has this bit set to 1.
BP	Alarm Bypass. If 0 (?), the alarm will not generate alarms or beam inhibits.

In the analog alarm blocks, a field is defined which indicates the data type that is used for comparisons. This is a recent addition to the protocol, so not all front-ends support it. For MOOC systems, this field removes the datatyping responsibilities from the programmer and places it in the database maintainer. To support automated populating of the database, the following algorithm is used.

If the alarm_type field is valid, it overrides any type specified by the programmer.
If the alarm_type is is invalid, the type specified by the programmer is used. The type is also forwarded as a setting, so that the next time the alarm block is downloaded, it has the current type.