EPICURE Design Note 131.7
Alarm Monitor Process Upgrades
David M. Kline
Introduction
This paper will outline the proposed upgrades for the Alarm Monitor Process
(AMP). The upgrades can be implemented in two phases: the first phase would
deal with modifications that relate to AMP internally, the second outlines
enhancements which relate to AMP and the alarm system. The following sections
describe the items for each of the phases.
Phase-I
Phase-I upgrades intend to be made to AMP internals and leave hooks for
Phase-II upgrades. Listed below are the proposed items:
list
[a)]
Set requests would be separated by device class and dispatched to
lower level Alarm Monitoring Tasks (AMT). Separate AMTs would be implemented
to provide data acquisition for normal and synthetic devices. Normal devices
would use the QVI or TVI common memory services, whereas synthetic devices
would use the EPICURE da_ services. Since the AMT used to service normal
devices is using the common memory services, AMP would be required to implement
some DAR and DAS functionality to build and manage DALs. This upgrade has the
additional benefit of distributing the load of data acquisition requests from
other layers of EPICURE subsystems and continued operation if DAS abnormally
terminates. Intelligent devices, such as ones which can communicate over
ARCnet, could be identified using the class property and submitted to another
AMT for processing (VMEbus or VAX/AXP).
[b)]
An image rundown handler (IRH) would be declared to expunge DAL lists and
terminate da_ services in the event that AMP terminates abnormally. Additional
functionality can be included such as flushing file buffers.
[c)]
AMP would declare itself as a network object allowing another path to query
alarms. This also provides a hook into AMP for Phase-II upgrades and other
possible functionality.
[d)]
Currently, AMP monitors the existence of DAS every second, then terminates when
it finds DAS missing. This can be more efficiently accomplished by DAS defining
an image rundown handler which sends a ``DAS down message'' to the AMP process.
Another possibility is for the AMP UTI to provide an automatic image rundown
handler (IRH) that would be invoked upon process termination. The routine would
automatically send the a message indicating the termination of the process. In
addition, the UTI could provide an optional parameter that allows a user
specified routine to be invoked preceeding the automatic IRH defined by
the AMP UTI. This would avoid conflict between the AMP UTI and applications.
This would conclude Phase-I. I would estimate that it would take 5-6
weeks to implement the above items and debug them.
Phase-II
Phase-II upgrades would make use of hooks in place from Phase-I and would
encompass other components of the alarm system. This section lists the items
which comprise the Phase-II upgrade and ideas which have been presented:
list
[a)]
A UTI can be written to request alarms reads, communicate DAE
messages from the DEMServer process, and implement a consistent interface
to set alarms from DAS or other requesters. In addition, since the alarm
information can be read from AMP, the ASB report facility can eliminate
accessing AMPs database directly.
[b)]
Repeat reads have been determined as necessary for alarms on page.
[c)]
Multiple alarms for a given device and property (READING,SETTING,STATUS)
can be implemented by defining Alarm State Descriptors (ASD) for individual
alarm limits. The ASDs follow the ASB header and represent multiple alarms
for the given device/property pair. Refer to the appendix for the data
structure definitions.
[d)]
The user can be notified specifically about device alarms by using the
User-Identification-Code (UIC). These codes are consistent between
VAX clusters and remote nodes which are managed by the EPICURE system
management staff. The user UIC is contained in the ASD (see appendix)
and is used by the ARD process to route the alarm message. In addition,
wildcards or a UIC value of zero can be used to indicate the alarm is
to be sent globally. Alarms which have been set remotely from nodes
which are not managed by the EPICURE system management staff, are
identified by a bit (foreign) located in the ASD. This bit instructs
the ARD process to use the information in the ASB extension of where
to send the alarm.
[e)]
Events which occur on the DAE can be communicated to the alarm system by the
DEMServer sending the status message codes to the AMP. DEMServer would use the
AMP UTI to connect and communicate the status message. AMP would construct a
device name from the front end the message was sent and retrieve the device
index property. The status code contained in the message would be mapped into
a particular bit of the devices status property. AMP would set the bit which
represents the message and send the appropriate information to the master ARD.
ARD would treat the message as a standard alarm and pass the information to the
remote ARD and alarm displays.
[f)]
DAS could monitor the response of lists from the DAE and drop the links when
lists have not been received within a predetermined amount of time. Dropping
the links would notify applications that the front end has crashed. The DEM
UTI can be used to communicate this to the AMP or DAS can use the AMP UTI
and directly communicate the exception.
[g)]
The alarm severity residing in the ARB header is passed to the alarm
display for interpretation. The user of the alarm display will make the
association between what color represents what severity. The severity
for a given alarm is defined using ALCON.
[h)]
Since the meaning of events is unclear at this point, a full implementation
can not be accomplished. Some ideas are for events to indicate when a device
exceeds a particular limit, when a quench occurs, or status messages. The
``eventmo'' bit located in the ARB header indicates whether ARD is to hold on
to the event until it expires. AMP will be responsible for determining a event
and holding on to it for a timeout specified by the user (from ALCON). AMP
will set the bit if a timeout was specified indicating that ARD hold on to the
event until AMP sends a clear message. When the event expires, AMP will
signal to ARD and both dismiss the alarm. If no timeout was specified, AMP
immediately signals the event to ARD and both dismiss the event. The event
timeout can be a default, or setup in ALCON, and can be identified in the
ASD structure as part of the ASB extension (``time'' member).
[i)]
The alarm display and local ARD will communicate and execute an application
when a particular criteria is met.
[j)]
An attempt will be made to support tiered alarms. The user will define the
tiered alarm from ALCON which includes a root device and others to be disabled
or enabled (as determined by the user) when it goes into alarm. If AMP
determines that the root device is in alarm, it signals ARD and disables
or enables monitoring the devices associated with it until the root device
goes good. The ASD data structure contains two members which support tiered
alarms. The ``tiered'' bit is used as an indicator that the particular device
is a member of a tiered alarm. The ``override_dipi'' longword contains the
device and property index of the root device. Additionally, this is useful
for tiered alarms which contain several levels.
[k)]
Activating an application (program) given particular alarm conditions can
be implemented by a separate application which interfaces to the ARD, or
specified as part of the alarm setup in ALCON. Another possibility is for
users to write EQL command procedures and run them in a batch environment.
At this point, what the users really want is unclear and further discussions
about the definition is needed.
[l)]
A lower level AMP is placed on the VMEbus using a Cyclone i960, or resides, as
a software entity (thread) in the new Alpha/PCI based front end.
This would conclude Phase-II upgrades to the AMP and the alarm system. I would
estimate that items which effect AMP would take about another 6-8 weeks to
implement. However, testing time may stagger since integration with other
components are necessary.
Appendix A: Alarm Header File
This page was intentionally left blank.
Appendix B: Proposed AMP UTI
The proposed AMP UTI is meant for direct communication to the AMP process from
remote of local applications. The primary clients of the UTI would be the
DAS process, ASB report application, and other diagnostic applications. This
appendix describes the proposed AMP UTI routines and their calling sequences.
amp_connect
(int)status = amp_connect( irh, irhp, ccount )
This routine establishes a connection with the alarm server network
objects and initializes internal data structures. An image rundown
handler (IRH) is declared which notifies the alarm server that the
caller process has terminated. The caller can specify another routine
to be part of the termination procedure by specifying the ``irh''
parameter. The alarm server IRH executes the user specified ``irh''
before is sends the termination notification to the alarm server.
Furthermore, the IRH is executed regardless of the method used to
terminate the process.
irh address of the image rundown handler that is
executed as part of process termination.
Passed by reference.
irhp value to be passed to the image rundown handler.
Passed by value.
ccount number of connections made to remote alarm servers.
Passed by reference.
status returns a condition value:
SS$_NORMAL success
SS$_NOPRIV fatal, user doesn't posses authorization
SS$_NOSUCHOBJ fatal, network object is unknown at remote node
AMP__INIT warning, interface already initialized
others as returned by EPICURE and VMS system services
amp_disconnect
(int)status = amp_disconnect( )
This routine cancels pending IOs, disconnects from the AMP network object,
restores the previous image rundown handler routine, and releases dynamic
memory allocated from previous calls to AMP UTI routine. No termination
notification is sent to the alarm server. Applications and processes that
do not want to notify the alarm server of it's termination MUST call this
procedure before exiting.
status returns a condition value:
SS$_NORMAL success
SS$_NOSUCHOBJ fatal, no connection established
others as returned by EPICURE and VMS system services
amp_flush_queued
(void)amp_flush_queued( )
This routine is used to flush messages which are queued to be
sent to the alarm server without performing a disconnect.
amp_get_status
(int)status = amp_get_status( ast [,astp] )
This routine requests the current status block that is maintained
by the alarm server. Notification of the return data is accomplished
through a user-provided AST routine (see user_ast) specified by the
``ast'' parameter. The parameters passed to the AST routine include:
completion status, node name string (source of data), return data,
return data length, and the user-specified parameter (``astp''). If
an error occurs on the logical link, the completion status, node name,
and user-provided parameters are valid. Furthermore, the AST routine
is called for each logical link that exists with an alarm server.
ast address of a routine that is called when the
status block has returned (see user_ast).
Passed by reference.
astp option parameter that is passed to the routine
specified by the ``ast'' parameter (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_INSFMEM fatal, insufficient virtual memory
SS$_NOSUCHOBJ fatal, no connection established
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_get_users
(int)status = amp_get_users( ast [,astp] )
This routine requests the current user list that is maintained
by the alarm server. Notification of the return data is accomplished
through a user-provided AST routine (see user_ast) specified by the
``ast'' parameter. The parameters passed to the AST routine include:
completion status, node name string (source of data), return data,
return data length, and the user-specified parameter (``astp''). If
an error occurs on the logical link, the completion status, node name,
and user-provided parameters are valid. Furthermore, the AST routine
is called for each logical link that exists with an alarm server.
ast address of a routine that is called when the
status block has returned (see user_ast).
Passed by reference.
astp option parameter that is passed to the routine
specified by the ``ast'' parameter (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_INSFMEM fatal, insufficient virtual memory
SS$_NOSUCHOBJ fatal, no connection established
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_read_alarm
(int)status = amp_read_alarm( dar, ast [,astp] )
This routine requests alarm reads for the list specified by the ``dar''
parameter. Notification of the return data is accomplished through a
user-provided AST routine (see user_ast) specified by the ``ast''
parameter. The parameters passed to the AST routine include: completion
status, node name string (source of data), return data, return data
length, and the user-provided parameter (``astp''). If an error occurs
on the logical link, the completion status, node name, and user-provided
parameters are valid.
dar address of the request message (alarm read).
Passed by reference.
ast address of the routine called when a list returns.
(see user_ast). Passed by reference.
astp parameter passed to the AST routine (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_NOPRIV fatal, no privilege for attempted operation
SS$_NOSUCHOBJ fatal, no connection established
AMP__TOOMANY fatal, too many connections
AMP__INVFTD warning, invalid FTD specified
AMP__INVREQ warning, invalid request type
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_read_device_records
(int)status = amp_read_device_records( ast [,astp] )
This routine requests the current list of Alarm Status Blocks that
are maintained by each alarm server. Notification of the return data
is accomplished through a user-provided AST routine (see user_ast)
specified by the ``ast'' parameter. The parameters passed to the AST
routine include: completion status, node name string (source of data),
return data, return data length, and the user-provided parameter
(``astp''). If an error occurs on the logical link, the completion
status, node name, and user-provided parameters are valid. Furthermore,
the AST routine is called for each logical link that exists with an
alarm server. The status code of AMP__NOMORE will be passed to the AST
routine when the end of the Alarm Status Block list is reached.
ast address of a routine that is called when an Alarm
Status Block returns (see user_ast).
Passed by reference.
astp option parameter that is passed to the routine
specified by the ``ast'' parameter (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__NOMORE success, no more data
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_NOSUCHOBJ fatal, no connection established
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_repeat_read_alarm
(int)status = amp_repeat_read_alarm( dar, ast [,astp] )
This routine requests repeat read alarm for the list specified by the
``dar'' parameter. Notification of the return data is accomplished through
a user-provided AST routine (see user_ast) specified by the ``ast''
parameter. The parameters passed to the AST routine include: completion
status, node name string (source of data), return data, return data
length, and the user-provided parameter (``astp''). If an error occurs
on the logical link, the completion status, node name, and user-provided
parameters are valid.
dar address of the request message (alarm repeat read).
Passed by reference.
ast address of the routine called when a list returns.
(see user_ast). Passed by reference.
astp parameter passed to the AST routine (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_NOPRIV fatal, no privilege for attempted operation
SS$_NOSUCHOBJ fatal, no connection established
AMP__INVRQDATA fatal, invalid request data
AMP__TOOMANY fatal, too many connections
AMP__INVFTD warning, invalid FTD specified
AMP__INVREQ warning, invalid request type
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_request_alarm
(int)status = amp_request_alarm( darm, ast [,astp] )
This routine requests alarm sets or reads given the list specified by
the ``dar'' parameter. Notification of the return data is accomplished
through a user-provided AST routine (see user_ast) specified by the ``ast''
parameter. The AST routine is executed for every list contained in the
request list. The parameters passed to the AST routine include: completion
status, node name string (source of data), return data, return data
length, and the user-provided parameter (``astp''). If an error occurs
on the logical link, the completion status, node name, and user-provided
parameters are valid.
darm address of the alarm request message.
Passed by reference.
ast address of the routine called when a list returns.
(see user_ast). Passed by reference.
astp parameter passed to the AST routine (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_NOPRIV fatal, no privilege for attempted operation
SS$_NOSUCHOBJ fatal, no connection established
AMP__TOOMANY fatal, too many connections
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_send_dae_status
(int)status = amp_send_dae_status( code )
This routine is specifically used by the DEMServer process and others which
monitor exceptions received from the DAE. The code retrieved from the
HERMES queue is hashed to a particular bit number and communicated to the
AMP. The source node name and message facility codes are used to derive a
device name. The bit number of the device represents the specific message
code. If the bit number exceeds the maximum number of allowable bits, an
overflow message is communicated.
code value of the status code received from the dae.
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_NOSUCHOBJ fatal, no connection established
SS$_NOPRIV fatal, no privilege for attempted operation
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
amp_set_alarm
(int)status = amp_set_alarm( dar, ast [,astp] )
This routine requests alarm sets for the list specified by the ``dar''
parameter. Notification of the return data is accomplished through a
user-provided AST routine (see user_ast) specified by the ``ast''
parameter. The parameters passed to the AST routine include: completion
status, node name string (source of data), return data, return data
length, and the user-provided parameter (``astp''). If an error occurs
on the logical link, the completion status, node name, and user-provided
parameters are valid.
dar address of the request message (alarm set).
Passed by reference.
ast address of the routine called when a list returns.
(see user_ast). Passed by reference.
astp parameter passed to the AST routine (see user_ast).
Passed by value.
status returns a condition value:
SS$_NORMAL success
AMP__QUEUED success, message queued but not sent
SS$_BADPARAM fatal, bad parameter value
SS$_NOPRIV fatal, no privilege for attempted operation
SS$_NOSUCHOBJ fatal, no connection established
AMP__TOOMANY fatal, too many connections
AMP__INVFTD warning, invalid FTD specified
AMP__INVREQ warning, invalid request type
AMP__OVRFLOW warning, too many outstanding messages
others as returned by EPICURE and VMS system services
user_ast
(void)user_ast(status, node, data, datalen, usrprm )
This user-provided routine is called at AST level (IPL 2) on behalf
of the user and indicates that return data has arrived for the
corresponding UTI routine. The routine is called with the above
parameters and below provides a description.
status completion status of the operation.
Passed by value.
node a character string of the source node name.
Passed by reference.
data address of the return data.
Passed by reference.
datalen size of the return data.
Passed by value.
usrprm user parameter.
Passed by value.
Keywords: ALCON, AMP, ARD, alarms, EPICURE
Distribution:
Normal
Security, Privacy, Legal
rwest@fsus04.fnal.gov