Next Generation
Accelerator Controls at Fermilab
PO Box 500, Batavia, IL 60510, USA
Abstract
Aged platforms, operating systems, programming languages, and software paradigms will be replaced with a reusable, software component architecture based upon the Java language.
The last major upgrade of the Fermilab Accelerator Control System occurred ten years ago. As technology improves, the mission of Controls includes the abandonment of custom software that is better implemented and supported by industry and has adequate performance characteristics for controls' application. That upgrade saw a significant movement from custom controls to industry or vendor standards in the following areas:
Just as a significant upgrade halts custom work in some areas, it provides opportunity for new custom work in other areas. The move to a 32 bit architecture promoted the growth of library routines from a few hundred to a few thousand routines to provide rich, expansive support for application programmers including:
The Accelerator Control System is continually assessed for weakness and opportunity. Small changes may be immediately embraced and implemented, but large changes are recognized to occur at infrequent intervals, since they often are costly in personnel and budget, and each significant upgrade should address multiple issues and provide a foundation offering growth and stability for several years.
Several indicators point to the need for a control system overhaul. Any self-assessment includes monitoring technology for an opportunity that matches our personnel, budget, and schedules while providing a path for growth with stability. Accumulated self-assessment has identified the following items that need to be addressed in the next controls upgrade.
2.1 Custom software
The X-Windows protocol is complex and costly to implement. Consequently, graphics and alphanumeric managers shield application and library programmers from the details of this protocol. Searches for value added services yielded other complex protocols, i.e. MOTIF and proprietary products. This system is heavily invested in custom, proprietary windowing support and needs to cease authoring custom windowing software and embrace drag and drop technology supported by integrated development environments (IDE) with all the portability of the X-Windowing system.
The ACNET (Accelerarator Controls NETwork) protocol is proprietary and non-portable. Modern communications' protocols have improved in service and performance and should be utilized. Furthermore, ACNET has multiple implementations on several platforms and operating systems.
2.2 Obsolescence
The operating system of the console and central processors (VMS) is obsolete. The platforms of the console and central processors cannot be upgraded. New users do not know or care to learn the details of obsolete systems.
Though the standard front end architecture utilizes VME or VXI and the VxWorks operating system, several front-ends utilize Multibus II and the MTOS and PSOS operating systems. Likewise many embedded systems are declared obsolete and await replacement.
2.3 Portability
The console and central processes have VAX and VMS dependencies and are not portable.
Front-ends have platform and operating system dependencies and are difficult to port.
2.4 Paradigms and languages
Object-oriented languages promote reusability and are preferred by programmers and commercial software vendors.
2.5 Maintenance and extensibility
Custom software implemented on multiple, diverse systems impede the implementation and expansion of services. Data acquisition protocols have changed little in twenty years though service improvements have been often discussed. The burden of maintaining multiple front-end platforms has made expansion difficult.
2.6 Productivity
The software development cycle of edit, compile, link, and execute limits productivity. IDEs and Computer Aided Software Engineering (CASE) tools should be utilized to increase productivity.
The features of the Java language are best described elsewhere. Suffice it to say the Java language addresses the shortcomings of the self-assessment, and the industry momentum of the language is stifling other alternatives for the next several years. The remainder of this paper will describe the step-wise progression of a major control system upgrade based upon Java. A brief summary of the strategy that will be expanded upon follows:
3.1 Consoles and applications tier
A console is any platform a user interacts with the control system and is capable of running a JVM (Java virtual machine). X-Windows will be retired. Java applications (on-site) and applets (off-site) are executed by the users. All FORTRAN, C, and C++ applications will be replaced. Increased security concerns and user profiling will require sensing devices that identify users of the system.
3.2 Development environment
All Java programmers will be provided with and expected to master a preferred IDE. Old habits are hard to break and an IDE requires an investment in time and effort to realize productivity gains. Likewise, the development platform and operating system will be specified to simplify and promote consistency.
3.3 Central tier
The Data Acquisition Engine (DAE) along with the commercial databases provide the central tier of the control system. The DAE supports secure access to the control system. The DAE shares Java objects with applications while initially accessing front-ends with ACNET. Further, the DAE consolidates all front-end traffic and provides services not implemented on all front-ends. The DAE supports thin clients by providing redundant services, retry and recovery mechanisms, and generally relieving applications from scheduling and resource complexities.
3.4 Front-end tier
The front-ends undergo stepwise change to reduce the number of operating systems, bus, software architectures, and ACNET implementations. A first milestone might target all front-ends to share VxWorks, VME or VXI, MOOC (Minimal Object Oriented Communication), and a single ACNET implementation on UDP. Open Access and instrumentation front-ends would skip this milestone or be front-ended respectively. Another milestone might include data acquisition protocol changes to support complex return events (Tevatron Clock, State Transition, MDAT (Machine DATa), and absolute time) and universal, time stamped data returns. Another milestone might include a JVM on each front-end for downloading, data acquisition accessibility, and an ACNET implementation with JNI (Java Native Interface) hooks as low into the front-end as possible. Finally, a milestone of ACNET retirement in favor of Java's RMI (Remote Method Invocation) or CORBA (Common Object Broker Architecture).
3.5 Networks
The DAEs will dramatically lower traffic to front-ends though system-wide consolidation at the cost of high DAE to DAE consolidation traffic. The DAE nodes will be confined to the computer room for access to high bandwidth links.
The Data Acquisition Engine has been under development for over a year. Ironically, ACNET, though targeted for retirement, was the first service implemented to provide accessibility to existing front-ends. The following are services that are essentially complete:
4.1 ACNET services:
ACNET is the current communication protocol to front-end tasks. It is also the transport for consolidation traffic between DAEs. The integrated time out service encompassing single and multiple replies provides notification for recovery from front-end problems.
4.2 Front-end the front-ends by supporting:
Since only scaled values are returned to clients, the DAE provides the translation from binary data to objects.
Many of our front-ends cannot return data on clock event since they do not have a clock event decoder. The data acquisition protocol does not provide for a delay off a clock event, nor the specification of a state transition event. The DAE's role is to service all user requests by polling the front-ends for their capabilities and providing timing services necessary to fulfill requests the front-ends cannot handle. Likewise, many front-ends do not support the fast time plot and snapshot plot protocol. The DAE will collect, time stamp, and return plot data for these front-ends.
To correlate data it is necessary to have all front-ends and DAEs synchronized to the same standard clock and to time stamp all data returns. Since time stamps are not in the current data acquisition protocol, the DAE will provide the time stamps until the protocol is changed.
The ACNET protocol is simple and efficient. However, it is not a portable protocol. Self describing data is portable, however it is less efficient as the wire must contain descriptions of the message components, and a processor must build the message descriptions. Consequently, when moving towards shipping portable objects, it is necessary to reduce or remove redundant and inefficient messages. The front-end with the highest bandwidth today is TLG, the time line generator, since G:SCTIME, the time in the super cycle is displayed on most alarm screens and many other application pages. Often, more than 50 clients are monitoring G:SCTIME. The current data pools consolidate front-end requests on a single node, but there are over 50 nodes in the control system today. The DAEs consolidate front-end traffic across the system. In the TLG's case, traffic would be reduced to a few percent of the present rate. Other methods such as "monitor change" can be employed to reduce traffic even further. This positions the front-ends to consider implementation of portable object protocols such as CORBA or RMI.
The consolidation role of the DAE moves the high bandwidth traffic out of the front-ends and into the DAEs. Though the DAEs are positioned to shared high bandwith links, and they can be easily upgraded to faster processors, multicast data pools are used to reduce overall bandwidth and processor time. A database table describes device/property pairs and their inclusion in up to 31 broadcast pools (varying frequencies or events). Each DAE collects and multicasts these pool properties for the front-ends it serves as a consolidator. Presently these pools are statically defined, but dynamic multicast pool assignments may be an attractive enhancement.
To gain the benefits of consolidation while VAX consoles remain, VAX data pool traffic could be redirected to DAEs. In some cases, ACNET traffic would traverse a tortuous path. For example, suppose CNS1 wanted a large reading from TEV. An ACNET message from CNS1's data pool manager would be sent on UDP to the consolidating DAE. Since TEV is on token ring, the message would be passed to JPASS on a VAX to resend the message using Ethernet. An Ethernet/Token ring bridge passes the message to TEV. When TEV emits the reply, it is transmitted first to PKTR (Ethernet/Token ring packetter) since token ring messages' maximum message size is longer than Ethernet and token ring front-ends do not packet messages, who sends multiple packets to the ACNET serving JPASS who passes the message back to the consolidating DAE using UDP who passes the message back to CNS1.
The length limitation of an ACNET user's maximum message length (3982 bytes) makes a simple task such as acquiring a 4 Kbyte table a difficult task. The DAE supports message assembly allowing the collection of buffers up to 32 Kbytes in length. The DAE assumes the front-end supports linear addressing, i.e. that an offset of 100 is measured in bytes. This is not always true, since the data acquisition protocol uses 16 bit addresses and some device buffers cannot be addressed in 16 bits. The data acquisition protocol should be modified to use 32 bits and specify linear addressing.
Each front-end is matched with a DAE that serves as its consolidator. That DAE also heartbeats the front-ends. The long, indeterminate time out for a receiver to decide that a message is not going to arrive, and consequently the front-end is off line or down is perceived by the user as a sluggish control system. The DAE's maintain the state of each front-end as up or down. A front-end that is down is continually monitored to discover its up state, but incoming user requests are immediately returned with error describing the front-end as down.
4.3 Redundancy and reliability:
The DAEs make considerable use of state devices served by STATES, the Open Access Front-end Client. STATES multicasts state transitions to interested observers. State transition devices to serve the DAE have been defined for DAE startup, shutdown, and is up as well as for front-ends startup and shutdown.
The DAEs are circularly linked and each DAE pings near neighbors until discovering a neighbor that is up. Using state transition devices, transitions up and down are communicated to all DAEs. Each DAE's front-end assignments also have a randomly assigned DAE backup, so when a DAE goes down, its consolidation responsibilities are distributed to many DAEs. All the state transitions are logged by the STATES data logger.
When a DAE is to be shutdown for a new software load, it will announce shutdown and wait to be unloaded of its front-end traffic, and users will not see an effect. Likewise, as it comes back up, the backup DAEs will cease consolidation for the front-ends they were backing up.
If a data acquisition job was attached to a DAE that was restarted, the job is resubmitted to the DAE if the job's control so specified (the default).
The DAEs attempt to maintain continuous, reliable communication with the front-ends. If a repetitive reply request ceases to receive replies, the request is restarted. If a front-end up event is discovered, all repetitive requests will be restarted.
The single shot data pools consist of three priority based pools. Highest priority is for collection after a state transition or Tevatron clock event. Medium priority is for typical user requests. Low priority is for large collections (BigSave for example).
The single shot data pools examine error returns and will retry some requests (request not queued, destination task busy, for example). They will also back off if the error return indicates a resource limiting error (smart module out of memory, for example). When in error recovery, the lower priority pools may be suspended. No user application should need to implement error recovery.
All of the one shot pools are paced, i.e. if many requests, collect a reasonable number from the front-end before collecting more. This frees applications from implementing a scheduling algorithm. BigSave, for example, will insert reading requests for all of Tevatron's device/property pairs in the low priority pool.
4.4 Data Acquisition Jobs:
The larger role of the DAE is supporting data acquisition jobs. The DAE provides the user with a single RMI interface to all data acquisition services - the job. The Open Access Front-end and Model architecture demonstrated how an application can be configured to display data from many data sources. The API (Application Program Interface) for the job specifies the from, to, what, when of data acquisition. The Java classes respectively are DataSource, DataDisposition, DataItem, and DataEvent. A parameter page type display would default to a DataSource of AcceleratorSource, i.e. from the front-ends in real time. A parameter page might have a DataDisposition of MonitorChange callback, i.e. call methods in the parameter page code only when values change. A parameter page might have a DataItem that points to a database table of device names. Finally, a parameter page would specify a DataEvent of DefaultDataEvent, i.e. use the default update rate specified in the database. Simply changing the DataSource to a SavedDataSource, the parameter page becomes a Save/Restore display application. Likewise, changing a simple data acquisition job to a persistent job with a DataLoggerDisposition and the basics of a data logger are created.
The DataSource components:
The DataDisposition components:
The DataItem components include:
The DataEvent components include:
This paper is not intended to be a programmers' guide to Java data acquisition. See the HTML files for details. The DAE's primary mission is to support data acquisition. The complexities of mapping accelerator data to objects, of creating save files and data loggers are implemented in the DAE and are esoteric and deliberately shielded from thin clients.
Although much has been implemented in the DAE, many services remain:
Data loggers, background applications, and comfort displays are examples of persistent jobs that require database table support and DAE assignment.
Likewise, the definition of a state transition job could be used to download front-ends, bypass alarm blocks on entering a maintenance state, ensuring machine conditions appropriate for an upcoming operation, etc.
Complex accelerator objects such as beam position monitors, multiwires, finite state machines will require many classes to map structured binary data into objects.
Offering persistence to complex objects is another task. Save/Restore should migrate from saving binary data in the database to saving objects, perhaps in an object database.
The front-ends' diversity, community, and development environment present challenges to an aggressive overhaul. The front-end's role is to present a consistent data acquisition interface to devices connected to front-ends on a variety of field busses. A critical self-assessment shows several shortcomings in that role. A survey of front-ends capabilities shows lack of support for some or all of the following:
Also, some front-end architectures do not support statistics and debugging facilities present in the ACNET protocol. Controls should define the requirements of front-end systems and implement those requirements on every front-end.
Previous discussion has pointed out the desire to move towards:
An estimate of the implementation costs in manpower is prohibitively high because of the multitude of front-end implementations. Any progress towards a dynamic front-end improvement posture must begin with a consolidation effort.
Controls should establish an aggressive timeline to accomplish the following:
This would retire token ring as a controls' communication protocol. This acknowledges that instrumentation needs front-end service. CAMAC front-end replacement should coincide with the retirement of GAS speaking devices. That implies a BPM replacement project should be underway. Upgrading Linac nodes implies VxWorks, the VxWorks ACNET implementation, and as much of MOOC as achievable, but certainly to include the break out of the basic data acquisition protocols. Every front-end should be purged of redundant implementations of services provided by MOOC. The process should include adding services to MOOC to confine the core of each front-end to its unique service.
In parallel, Controls should develop the ability to release and reboot every front-end node within a 20 minute period in a controlled fashion with an ACNET or MOOC upgrade.
Controls should follow with a schedule to accomplish the following:
Longer term directions should presume JVM capability in each front-end. The Open Access Front-end architecture should be developed as a cooperative venture by the DAE and front-end programmers. Message breakout, self download, event dispatching, SQL query capability, data acquisition from other nodes, data pools, persistence, alarm block monitors and observers, fast time plot and snapshot plot are just examples of services common to MOOC and a Java front-end infrastructure. Long term goals should offer identical services to the application and front-end programmers. This team should explore the boundary of Java and JNI for a front-end with a field bus.
The applications suite of this control system represents hundreds of man-years of effort. Any replacement and improvement of these services must exploit a high degree of reusability.
Several features of the DAE were designed to promote thin clients, that is to relieve the application of scheduling, resource, and recovery code associated with data acquisition problems. As early applications emerge, each developer must contribute to the identification of potentially redundant implementations and insist that reusable solutions are provided for everyone.
This control system needs but one plot grid class for example. That plot grid class will have to adapt to future users' requirements, but never should a second plot grid class emerge.
JavaBeans, wizards, shareware, and commercial applications should all be important contributors to the application suite.
Classes should emerge for a coherent approach to sequenced applications. New tools should provide new approaches to automation and complex operating scenarios.
The strategies for the development of applications to operate this accelerator and beamline complex should be developed in another paper.
The security and accountability policy of the control system must expand to include all the resources of the control system, i.e. database, file, printing, device control, bandwidth., etc.. The goals are to provide a safe environment to minimize mistakes, a controllable environment to ensure the operators may control the complex, a flexible environment to provide varying service by user, location, and machine state, and an environment whose accountability minimizes mystery.
Each user of the control system should be known to the system preferably through an active device to provide authentication and accountability as well as convenience profiling for that user .
Applications written by authorized users must be submitted to a formal process to gain access to control system resources.
The outlines of this project were the subject of a departmental talk on February 2, 1998. The intervening 17 months has seen much progress in the central tier, but little in the application and front-end tier. Application commitments in support of Collider Run II are not being fulfilled. The project has generated only modest support and enthusiasm by department management. Very few departmental programmers have written a line of production Java. The six month 'dead time' while learning object oriented programming techniques await most programmers.
The scope of the work ahead can be daunting. It would be less so if every programmer was assigned a task, even at a hour a week. An example should be set by the supervisors, leaders, and senior members of the department. Department management should insist upon it.
Every project should have a timetable. When projections indicate convergence with a goal in years when months or weeks are needed, a variety of talents within the department should review the problem and publicly present their suggestions. Follow-up should continue until the project converges with a new goal.
This project may require contract labor. The expertise of local staff is required to assign and supervise work we contract. We should be acquiring that expertise.
With every major upgrade, the new architecture spawns unanticipated improvements. We should look forward to significant positive change.
Everyone should be involved. If they are, we are okay.
The Collider Run II commitments should be reviewed.
10 References
[1] https://www-bd/controls/java/meetings/Talks_&_Papers/java_data_acquisition/