DCD Release Note 9.1 <P> <b> CENTURION- An Idle Process Terminator</b>

DCD Release Note 9.1

CENTURION- An Idle Process Terminator

JACK C. SCHMIDT AND FRANK J. NAGY

.5cm

CENTURION watches for idle processes by waking up every n minutes and looking for any processes which have accumulated negligible CPU time, buffered and direct I/O counts. Any such interactive processes are sent one or more messages (via broadcasts) before being sent one last message and killed. A message is sent to OPCOM whenever a process is killed.

CENTURION was created with flexibility in mind. The system manager has the ability to monitor and delete idle processes on a constant basis or to limit the monitoring to only a few hours of the day. Lists are provided so that the system manager can specify which idle processes to delete, which ones to not delete during a given time and which ones to NEVER delete. Another important option of CENTURION is that the manager can select the way in which an idle process is removed.

CENTURION runs as a detached process with WORLD, OPER, NETMBX and TMPMBX privileges and gets its parameters from logical names. A system logical table is defined ( LNM$CENTURION_LOGICALS) to contain all CENTURION logical names. The logical table can, in theory, be defined in three sections: general, normal operations, and special operations.

Logicals and Their Meaning

Numerous logicals are provided to allow the system manager to tune the program to their needs. CENTURION requires the definition of several boolean-valued logicals. Setting choices for these logicals are: TRUE/FALSE, "True"/"False", T/F, 1/0, and ON/OFF. General purpose logicals affect the overall functionality of CENTURION. Normal operations logicals if enabled ( ENABLE_NORKILL) are applied to idle process deletion on a continuous basis. Special operations logicals can be enabled to allow the system manager to create a `time window' in which processes are removed according to a second algorithm. CENTURION will apply special operations logicals before the normal operations logicals. See the Program Flow section of this document for more information.

General Logicals

CENTURION requires certain logicals to be defined regardless of the time. These logicals control when the program `wakes up' and checks for idle processes, when to display idle time warning messages, appending manager supplied messages to program supplied warning messages, how CENTURION deletes an idle process, and whether the program frees pages between cycles.

list [SCAN_FREQUENCY] [] Defines how long CENTURION waits between scans for idle processes. Range is 3-60 minutes. The default is 10 minutes.

[IDLE_WAIT_TIME] [] Defines the time between warnings to idle processes. This setting MUST be greater than SCAN_FREQUENCY. Range is 15-300 minutes. The default is 60 minutes.

[ENABLE_MESSAGE] [] If enabled, CENTURION will append an extra text message to the standard `User has been idle for n minutes'. The default for this boolean-valued logical is OFF.

[MESSAGE] [] Extra message to be appended to the standard warning. The text should be contained in quotation marks.

[ENABLE_FORCEX] [] If true, calls SYS$FORCEX to remove the idle process before using SYS$DELPRC. Enabling this logical will cause the target process to exit using normal procedures. This should be set if a manager suspects a process to be using RMS or Rdb. The default for this boolean-valued logical is ON.

[FORCEX_WAIT_TIME] [] Used in conjunction with ENABLE_FORCEX, defines the time between calling SYS$FORCEX and SYS$DELPRC. Range is 5-60 seconds. The default is 5 seconds.

[ENABLE_PURGEWS] [] If enabled, CENTURION will purge the working set after scanning processes. This should be used on systems with heavy memory loads. The default for this boolean-valued logical is OFF.

[IMMORTAL_LIST] [] A maximum list of 32 names. If a list is found in the logical table, CENTURION will ALWAYS ignore idle interactive processes belonging to these users. The list can consist of usernames and/or port:usernames seperated by commas. This logical applies 24 hours a day, seven days a week, 365 days a year.

Normal Operations Logicals

These logicals control how long CENTURION waits before deleting idle processes and provides lists which decide whether an idle process is specifically deleted or protected from deletion. If the system manager creates a protect list and a hit list, and a user appears on both lists, the hit list is interpreted first, hence, the user's process will be removed.

list

[ENABLE_NORKILL] [] If enabled, CENTURION checks for list definitions. If no lists are defined, CENTURION will kill all idle processes. If NORPROTECTED_LIST and IMMORTAL_LIST are defined, processes belonging to users in these lists will not be removed. If NORHIT_LIST is defined, only idle processes belonging to users in this list will be removed. The default for this boolean-valued logical is OFF.

[NORKILL_WAIT_TIME] [] Defines how long a process must be idle before it is killed. The range is 45-600 minutes. Must be set to a greater value than IDLE_WAIT_TIME.

[NORHIT_LIST] [] A maximum list of 32 names. If a list is found in the logical table, CENTURION will only kill idle interactive processes belonging to these users. The list can consist of usernames and/or port:usernames seperated by commas. If this list and the NORPROTECTED_LIST are both defined, CENTURION will search the protection list first.

[NORPROTECTED_LIST] [] A maximum list of 32 names. If a list is found in the logical table, CENTURION will ignore idle interactive processes belonging to these users. The list can consist of usernames and/or port:usernames seperated by commas. If this list and the NORHIT_LIST are both defined, CENTURION will search the protection list first.

Special Operations Logicals

These logicals provide the system manager with the ability to define a `time window' in which idle processes will be deleted instead of continously. The time window logicals require that the hours be entered in military time, and that they span at least a two hour time limit. Defining the start and stop logicals in military time allows the system manager to define the window from late at night until early the next morning. An example of this would be to set SPCSTART to 23 (11p.m.) and SPCSTOP to 3 (3a.m.).

Special operations mode can run in conjunction with normal operations mode. CENTURION will try and interpret the special operations information first and then handle normal operations definitions. Note that the logical SPCKILL_WAIT_TIME has a range of 120-1440 minutes, while NORKILL_WAIT_TIME has a range of 45-600 minutes; this makes it possible for a `time window' to be defined and an idle process to be logged out by the normal operations algorithms before the special operations algorithms are implemented.

list [SPCSTART] [] Defines the start of special operations. The range is from 0000-2400 hours. The expected values are 0-24. The default value is 18 (6p.m.).

[SPCSTOP] [] Defines the end of special operations. The range is from 0000-2400 hours. The expected values are 0-24. The default value is 22 (11:00p.m.).

[ENABLE_SPCKILL] [] If enabled, CENTURION will kill processes during `special operations'. The default for this boolean-valued logical is ON.

[SPCKILL_WAIT_TIME] [] Defines how long a process must be idle during special operations before it is killed. The range is 120-1440 minutes. The default is 120 minutes.

Program Flow

CENTURION is run as a detached process on any node other than DECwindows-based workstations. First logicals are translated and SYS$SCHDWK is set to wake up every SCAN FREQUENCY minutes. The program then goes into a loop. The system time is translated and a flag is set if the current time falls between the special operations limits. SYS$GETJPIW is called to build a list of all processes. The list contains the process id, cpu used, buffer I/O count, direct I/O count, and process type. If SYS$GETJPIW returns SS$_SUSPENDED status, no information is stored for that process and the program skips to the next one.

CENTURION then determines if a process has been idle by subtracting the old cputime, buffer I/O and direct I/O counts from the new cputime, buffer I/O and direct I/O counts. If the cpu time difference is less than 4 milliseconds, or the buffered I/O count difference is less than 2 or the direct I/O count is less than 1, CENTURION assumes the process has been idle, increments the warning count and verifies the process type is interactive. If the process type isn't interactive than the program skips to the next process. If the process type is interactive, and IMMORTAL_LIST is defined, see if the process belongs to a user on that list. If so, skip to the next process.

If the idle process doesn't belong to a user on the IMMORTAL_LIST, see if ENABLE_SPCKILL has been turned on and the current time falls in the special operations window. If it does than CENTURION decides whether to notify the process of idleness or whether to remove the process. If the process is to be deleted, check to see if ENABLE_FORCEX has been defined and remove the process accordingly. Skip to the next process. If the process hasn't been deleted, CENTURION checks to see if the process belongs to a user identified in the NORHIT_LIST. If a match is made, the program steps to the "normal operations" deletion section of the code. If not, search the NORPROTECTED_LIST to see if the owner of the process is on this list. If the process belongs to a user on the NORPROTECTED_LIST, skip to the next process.

The process has now reached the "normal operations" killing section. The program decides whether to just warn the process of idleness, or to remove the idle process. If the process is to be deleted, check to see if ENABLE_FORCEX has been defined and remove the process accordingly. Skip to the next process.

The loop is exited after the process list has been exhausted. If ENABLE_PURGEWS is true, purge the working set. The code then hibernates until the timer goes off.

Installation

A startup file is distributed with the executable. The startup file creates the logical table, defines the logicals to suitable defaults, sets up a logging file for any problems, and runs CENTURION. The default startup file looks like:

$! CENTURION_STARTUP.COM   Start Centurion idle process killer process
$!
$!	Create logical name table for Centurion parameters
$!
$ CREATE /NAME_TABLE	CENTURION_LOGICALS	-
    /PARENT_TABLE=LNM$SYSTEM_DIRECTORY /PROTECTION=(S:RWED,O:RWED,W:RE)
$ DEFLOG :== DEFINE /NOLOG /TABLE=CENTURION_LOGICALS
$!
$! 	Define Centurion parameters via logical names
$!
$ DEFLOG     SCAN_FREQUENCY      10    !Minutes until Centurion wakes up again.
$ DEFLOG     IDLE_WAIT_TIME      60    !Warnings at 60,120,... minutes idle
$ DEFLOG     ENABLE_MESSAGE     OFF
$ DEFLOG     MESSAGE	 "Idle thoughts make for idle hands.."
$ DEFLOG     FORCEX_WAIT_TIME     5    !seconds between forcex and delprc 
$ DEFLOG     ENABLE_FORCEX	 ON    !Call forcex to remove idle processes ON/OFF
$ DEFLOG     ENABLE_PURGEWS	OFF    !Purge working set between cycles
$ DEFLOG     IMMORTAL_LIST  SYSMANAGER,NETMANAGER,OPA0:SYSTEM  ! Never kill idle processes
$!
$ DEFLOG     ENABLE_NORKILL     OFF    !Enable normal operation mode.
$ DEFLOG     NORKILL_WAIT_TIME   60    !Kill at 60 minutes idle
$!
$ DEFLOG     SPCSTART            18    !Defines start of special operations (6p.m.)
$ DEFLOG     SPCSTOP             22    !Defines end of special operations (10p.m.) 
$ DEFLOG     ENABLE_SPCKILL      ON    !Enable special operation mode
$ DEFLOG     SPCKILL_WAIT_TIME  120    !Kill any idle user (2 hrs) in special hours
$!
$!	See if there are any local definitions, if so suck them up
$! 
$  IF F$SEARCH(" SYS$STARTUP:CENTURION_LOCAL_LOGICALS.COM") THEN - 
$    @SYS$STARTUP: LOCAL_LOGICALS
$!
$!	Start Centurion detached process.
$!	First delete any CENTURION.LOG files from previous executions.
$!
$ IF F$SEARCH("SYS$MANAGER:CENTURION.LOG") THEN -
$	PURGE SYS$MANAGER:CENTURION.LOG;*
$ RUN   /PROCESS="Centurion" /PRIORITY=12 -
        /DUMP -				!In case of bugs
        /UIC=[SYSTEM] /OUTPUT=SYS$MANAGER:CENTURION.LOG -
        /MAXIMUM_WORKING_SET=300 /WORKING_SET=100 -
        /PRIVILEGES=(NOSAME,WORLD,SYSPRV,OPER,NETMBX,TMPMBX) -
        CENTURION
$ EXIT
$!=============================================================================
$! START_CENTURION.COM
$!
$! Spawn the Centurion process to watch for idle interactive processes
$! and kill them after a pre-selected interval of idle-ness.
$!
$! Centurion can be parametrized via logical names in the CENTURION_LOGICALS
$! logical name table:
$! 
$!   Logical name       Default      Purpose				      Range
$!   ------------       -------      ---------------------------------        -----
$!   SCAN_FREQUENCY     10 mins      between scans for idle processes         (3-60)
$!   IDLE_WAIT_TIME     60 mins      between warnings to idle processes       (15-300)
$!   ENABLE_MESSAGE     OFF          extra message line on warnings           ON/OFF
$!   MESSAGE            "text"       extra message line
$!   ENABLE_FORCEX      ON           Call forcex to remove idle processes     ON/OFF
$!   FORCEX_WAIT_TIME   5 secs       between forcex and delprc                (5-60)
$!   ENABLE_PURGEWS     OFF          purge working set after scan?            ON/OFF
$!   IMMORTAL_LIST      user,...     list of usernames for CENTURION to       32 max.
$!                                            ignore no matter what!
$!
$!   ENABLE_NORKILL     OFF          normal operations enable                 ON/OFF
$!   NORKILL_WAIT_TIME  180 mins     until idle process killed                (45-600)
$!   NORHIT_LIST	user,...     list of usernames for CENTURION to       32 max.
$!                                            scan (exclusive of other users)
$!					      during on operations.
$!   NORPROTECTED_LIST	user,...     list of usernames for CENTURION to       32 max.
$!                                            ignore during on operations.
$!
$!   SPCSTART           18           Defines start of special operations      (0-24)
$!   SPCSTOP            22           Defines end of special operations        (0-24)
$!   ENABLE_SPCKILL     ON           Special operations enable                ON/OFF
$!   SPCKILL_WAIT_TIME  120 mins     Kill users during special operations if  (120-1440)
$!                                     idle for this length of time or more
$!=============================================================================

The default logicals are defined so that only processes that have been idle for more than 2 hours between 6p.m. and 10p.m. will be removed from the system. Idle processes belonging to SYSMANAGER,NETMANAGER and OPA0:SYSTEM will be spared from deletion. The ENABLE_FORCEX logical is enabled so that the program will call SYS$FORCEX to remove the idle process. After 5 seconds ( FORCEX_WAIT_TIME) the program will see if the process still exists. If it does than SYS$DELPRC is used to remove the process. Since ENABLE_NORKILL is turned off, any logical dealing with normal operations will not be implemented. CENTURION_ENABLE_MESSAGE is turned off so the message " Idle thoughts make for idle hands.." will not appear.

Custom Tailoring

The startup file provides defaults for running CENTURION but it also allows the system manager to `fine tune' CENTURION to specific needs. This is done by creating the file CENTURION_LOCAL_LOGICALS.COM in the system startup area. Let's examine a sample local logical file and possible modifications to it.
$! CENTURION_LOCAL_LOGICALS.COM  Fine tune CENTURION default logicals
$!
$!
$! 	Define Centurion parameters via logical names
$!
$!
$ DEFLOG     IDLE_WAIT_TIME       30    !Warnings at 30,60,... minutes idle
$ DEFLOG     ENABLE_NORKILL       ON    !Enable normal process deletion..
$ DEFLOG     NORKILL_WAIT_TIME    60    !Kill at 60 minutes idle
$ DEFLOG     ENABLE_SPCKILL      OFF    !Disable process deletion during special operations
$!
$ EXIT

Since CENTURION_LOCAL_LOGICALS.COM is called from START_CENTURION.COM, the DEFLOG definition is transferred.

This example will notify an idle process every thirty minutes of its `idleness', and delete the process after an hour of `idleness'. Since ENABLE_SPCKILL is disabled, the program will apply these settings constantly.

Adding the following definition to the above list-

$ DEFLOG     NORHIT_LIST   THOMPSON,RITCHIE   	! Log off idle processes

defines that only idle processes belonging to THOMPSON and RITCHIE would be deleted. Expanding the local file with special hour information definitions-

$ DEFLOG     ENABLE_SPCKILL      ON     !Enable kill all users in special operations
$ DEFLOG     SPCKILL_WAIT_TIME  120     !Kill any idle user (2 hrs) in special operations
$ DEFLOG     SPCSTART             0     !Defines start of special operations (midnight)
$ DEFLOG     SPCSTOP              6     !Defines end of special operations (6a.m.) 

Since special operations limits have now been imposed, only processes owned by THOMPSON and RITCHIE which have been idle for an hour will be deleted from 6a.m. to midnight. But from midnight to 6a.m. idle processes owned by THOMPSON and RITCHIE will be deleted after an hour, while all other idle processes will be deleted using the special operations algorithm.

Log Files

CENTURION creates a log file (SYS$MANAGER:CENTURION.LOG) to catch all system error messages generated by the program. The log file will record the program's problem with logical translation, timing errors, and the failure to delete a process. If the system manager suspects the program is malfunctioning then the log file is the first place to check.

Security, Privacy, Legal

rwest@fsus04.fnal.gov