RD Controls Software Release Note 2.1 Epicurean Documentation Extraction System V2.0 Debra Baddorf January 6, 1987 Purpose One of the least liked facets of software production is cre- ating the documentation necessary to explain the code and its usage to others. Upkeep of documentation as the software is modi- fied is equally time consuming. Frequently the documentation suffers. Incorporating a brief descriptive paragraph in the code it- self is easy for the author to do, and can also provide an inter- nal reminder of a routine's format. Such descriptive text can be easily modified as the code changes, so that documentation need never be out of date. To be useful as external documentation, such paragraphs must be pulled out of the code, formatted in a consistent manner, and merged with other such items into a print- able document and/or a help library. This is the function of the source text extraction service described here. Implementation The documentation EXTRACTOR is itself a program, whose task is to scan a text file and copy out parts which are specially flagged. The program needs to know which parts of the text file pertain to the 'comment' syntax of that particular language, and which parts are the text to be extracted. The intention is for the extractor to handle code in several different programming languages, depending on the source file extension. This will facilitate running the extractor on all the code in any given directory, or on a list of names, even if all the files are not written in the same programing language. A table of recognized comment formats is included. The EXTRACTOR must scan a text file searching for a particu- lar flag. This flag indicates that a block of descriptive comment begins here. The flag is thus a sequence which may not appear any other place in any programmer's code. A similar flag phrase with slight modification will signal the end of a comment block. A sample block of extractable text is delimited as follows: C+ ROUTINE_NAME C text...... C- or /*+ ROUTINE_NAME * text..... -*/ The beginning delimiter flag consists of X+nNAME where X is the language's comment symbol(s), + is required and is the part keyed on by the extractor, "n" is either a blank or a number from 1 to 9 indicating the depth of the text block, and NAME is the routine name. The ending delimiter flag is X- where X is the language's comment delimiter and - is required syntax for the extractor. The exceptions to this are languages such as C or Pascal which require the comment delimiter to follow the end of the comment. These languages are treated as -X (as in the second example above). The + sign is required to be in the 2nd or 3rd column depending on the comment character(s), and the - sign is required to be in the 1st, 2nd, or 3rd column, again depending on the language. The "space" or 1-9 parameter is added to this initial flag to declare the level of the extractable block. This can be used for documentation of a whole program, with main routines having a lower level number than subordinate routines. A space is inter- preted as a 1, and should be the preferred parameter for routines submitted to a system library. In this way, all library routines can be extracted at the same level number, and the numbers then shifted as necessary to insert the documentation for the whole library into the system HELP files. Every file to be processed must have at least one block of extractable text within it. More than one block is acceptable, and indeed desirable if there are multiple entry points to the module. Within each block of delimited code there should be embedded formatting commands for the extractor to process. These all start with a backslash character, and serve to identify the parts of the documentation block. A further requirement is that the formatting commands be the first non-blank word on the line, with the excep- tion of the comment character(s). Embedded formatting commands are described below. Embedded Formatting Flags \subnam Signals that the next single word is the subroutine or function name. \subcal The text which follows is the normal calling sequence for the function. See example included. \subcalend Required syntax to signify the end of the calling sequence phrase. \subtxt Signals the start of the descriptive text about the subroutine or function. \arglist Begins the list of argument descriptions. Must be included before any '\argn' flag. \argn Signals that the next single word is the argument name. Following the argument name is the descriptive text about that argument. \intlist Signals a list 'internal' to the argument list. Sug- gested usage is for listing possible error returns. Internal lists may not be nested as currently defined. \intn The next single word is the keyword for the internal list item. Text for the list item follows the key- word. \intlend Required syntax to signify the end of an internal list of items. \arglend Required syntax to signify the end of the list of arguments. \literal Prints following lines exactly as typed. Intended for diagramming. No text may be included on this line. Diagrams should start in the first column after any comment delimiters. (Asterisks in Pascal or C are assumed to be comments in column 1 or 2.) Diagrams should be no more than 70 columns wide, not counting comment characters. Within an arglist width may be 60, and within an intlist, 45. \endliteral The end of the literal section. No text may be in- cluded on this line either. \newpage Optional. Starts a new page. Useful in keeping a large literal section on the same page. Command Line Arguments The program is being designed to run as a foreign DCL com- mand. One defines a symbol for the name, for example: $ EXTRACT_IT :== $[wherever]:EXTRACTOR_V20 and then it can be run by typing the symbol and appending any desired qualifiers, as in: $ EXTRACT_IT/LEVEL=(1-5)/PROCESSING=ALL myfile.c The current location ('wherever') of the EXTRACTOR image is: BSNDBG::RDCS$ROOT:[EXE] or BSNDBG::RDCS$EXE if that symbol is defined for you. Parameter and qualifiers are listed in the following table. The EXTRACTOR will produce up to four types of output. One contains LaTeX formatting commands and can be processed and then printed on a laser printer. Another contains DSR commands with level numbers in position so that it can be included in a help library after processing. A third will also contain DSR commands, but will not list the level numbers. It is designed for printout on a generic or "vanilla" printer. The fourth output option is designed for backward compatibility with existing programs which contain the +/- block extracting flags, but no embedded formatting flags. In this case, the extractor will extract the delimited block, optionally remove the comment delimiters at the beginning of each line, and just print the text as is to a file, without any formatting. Please note that, unless the option is selected to produce only the PLAIN file (the last described above), it is considered an error if the file does not have embedded formatting flags. Command Line Parameter and Qualifiers filename Specifies what file(s) the EXTRACTOR is to work on. If a file extension is omitted, or is .LIS, the file is assumed to contain a list of filenames on which the EXTRACTOR is to operate. Otherwise, the exten- sion must be one of the recognized set, currently including: .C .H .Z80 .COM .MAC .Z8K .FOR .MAR .68K .FTN .PAS If filename is omitted entirely the program will create a temporary file containing the directory listing of the current default directory, and process any files with recognized extensions. /ALPHABETIZE (D) /NOALPHABETIZE Specifies whether separate extractable blocks which the extractor finds are to be ordered alphabetically or left in the order they were found. Default is to alphabetize. /EMBEDDED (D) /NOEMBEDDED If /EMBEDDED is selected, the file is assumed to have embedded formatting commands. It will be considered an error if none are found. /NOEMBEDDED may only be selected with PLAIN, HELP, EPIC, or CLIB processing options; it may not be selected with LATEX, RUNOFF, or any combination which includes these. /EMBEDDED is the default; however, selecting /NOREMOVE forces /NOEMBEDDED. /LOG (D) /NOLOG This option selects whether information about what headers were processed and what errors were found, etc. should be saved to file EXTRACTED.LOG. Default is to LOG everything. /QUIET /NOQUIET (D) This option selects whether information about what headers were processed and what errors were found should be printed to the terminal. The default is to print them. /REMOVE (D) /NOREMOVE If /REMOVE is selected, the comment markers at the start of each line of the header (where appropriate) are removed from the output text. /NOREMOVE may only be selected with PLAIN or EPIC processing options; it may not be selected with LATEX, RUNOFF, HELP, or any combination which includes these. /REMOVE is the default if nothing is selected. /NEWPAGE /NONEWPAGE (D) If /NEWPAGE is specified, a form feed will be inserted before each new documentation block (each +nNAME found). The default is to fit several blocks on each page. Note that you also have a /newpage which can be embedded in the documentation text to produce a form feed at precisely the place you need it. /HELP_LEVEL=n Specifies the level number to be output in the help text file only. Valid only if HELP is one of the processing outputs selected. /LEVEL=n Specifies level of documentation blocks to be ex- tracted. This refers to the number or space which comes after the + sign in the block delimiters which flag the beginning of a new block. Acceptable choices for n: n where n is between 1 and 9. Space is interpreted the same as a 1. n-m n,m both between 1 and 9, inclu- sive. Example 3-5. nTOm n,m both between 1 and 9, inclu- sive. Example 3TO5. (n,m..) List of integers enclosed in paren- theses. ALL All blocks, which means 1 through 9 and includes "space". Default is LEVEL=1. /OUTPUT=filename Specifies the file to be written if only one type of output file is selected. If more than one type of output is to be produced (see PROCESSING) the file extension specified is ignored, and default file extensions are used. They are: filename.HLP help file output, if NOEMBEDDED filename.HLPRNO help file output which must be RUNOFFed first. filename.RNO RUNOFF-able output filename.TEX LaTeX-able output filename.TXT plain output /PROCESSING Specifies what type of output files are to be pro- duced, and what type of processing must be done on embedded commands. Valid options: /PROCESSING= PLAIN Documentation block is written as is to a file, once extracted from code. RUNOFF Substitutes RUNOFF commands for embedded formatting commands. Output (file.RNO) must be RUNOFFed before printing. HELP Substitutes RUNOFF commands for any embedded commands present. Inserts help level numbers where needed. Output (file.HLP or file.HLPRNO) may need to be RUNOFFed if embedded commands were present. (Embedded commands should be present in all newly written code.) LATEX Substitutes LaTeX commands for embedded formatting commands. Output must be LATEXed before printing (to a laser printer). YES (D) Same as (RUNOFF,HELP,LATEX). (default) ALL Same as (RUNOFF,HELP,LATEX,PLAIN). CLIB Same as (PLAIN,HELP). EPIC Same as PLAIN. A list of options may also be specified within paren- theses, as /PROCESSING=(LATEX,RUNOFF). Examples Sample source and output files are attached. Keywords Controls, EPICURE, documentation, program Distribution: normal