VxWorks Error Reporting

Rich Neswold

$Date: 2000/11/06 16:05:02 $

This document describes a VxWorks Error Reporting facility that is more capable than the one provided by Wind River Systems.

Contents

1.  Using This Module
2.  Function Reference
3.  The Implementation Details

1.   Using This Module

1.1.  Debugging and Diagnostic Messages
1.2.  Using it on VxWorks

This section describes how the logging facilities are used on both a VxWorks machine and a remote machine trying to monitor the messages. This VxWorks API was designed to be easy to use; the function names and parameters follow a logical progression so they should be easy to remember.

1.1.   Debugging and Diagnostic Messages

1.1.1.  Justification of the New Module
1.1.2.  Message Classifications
1.1.3.  Message Format
1.1.4.  Writing a client

This section describes the intents of the logging facility and how to effectively use it to debug or provide useful diagnostics for your application. This section also describes the view of the logger from outside of the VxWorks node (i.e. what is the message format, how do you connect to the logger, etc.)

1.1.1.   Justification of the New Module

1.1.1.1.  Destination of Messages
1.1.1.2.  Message Location in Source Code
1.1.1.3.  Shutting Off Debug Messages
1.1.1.4.  Filtering Messages
1.1.1.5.  Messages in Interrupt Routines

One primitive, yet still useful, way of debugging an application is to insert printable messages at strategic points in your application. As your program flows through its logic, a sequence of messages indicates whether the program is functioning correctly. Although useful, this form of debugging has some drawbacks:

Wind Rivers has provided a limited message logging facility. This system fixes some problems, but still has many limitations. The item that is completely supported is the last; log messages can be generated by an interrupt routine. Wind River's solution also partially solves the first item. Up to five file handles can be added to the logger so the messages are sent to other destination than just the console.

This new message logger solves all the aforementioned drawbacks.

1.1.1.1.   Destination of Messages

The new logger will send the messages to the console, like the other solutions. In addition, it can send the messages to a client across the network via datagrams. When the logger receives a datagram from a client, it adds the client to a list. As messages are generated, they are sent to the clients in the list. A client must periodically re-register itself because after five minutes it will be removed from the list.

If there are many messages being generated by a task, it can be nearly impossible to use the command shell. The function logOff will stop the messages from being sent to the console (although they continue to be sent to network clients.)

1.1.1.2.   Message Location in Source Code

The message header, which is unconditionally prepended to the message by the logger, contains information which narrows down its origination in the source code. This is described later in the document. Debug messages also indicate the file name and line number in which the message was specified.

1.1.1.3.   Shutting Off Debug Messages

When a module has been sufficiently debugged, it's nice to be able to remove the messages that aren't necessary for day to day operation. With this logger, the presence or absence of debug messages is controlled by the NDEBUG symbol. The inclusion logic follows that of the assert() macro; if NDEBUG is present, the debug messages will not be compiled into the object code.

1.1.1.4.   Filtering Messages

Messages are given one of four classifications. The clients can specify which class of messages they want to receive. This prevents a client from receiving messages it doesn't want.

1.1.1.5.   Messages in Interrupt Routines

Because the logger uses a secondary task to do the actual reporting, this logging facility is safe to use from within an interrupt routine. You should remember that the logging routines take a little time to perform, so using them in an interrupt will slow the handler down. Also, a constantly occurring interrupt can easily fill the queue and cause the logger to start skipping messages (since an interrupt can't be blocked, the logger simply discards the message and updates a counter of skipped messages.)

1.1.2.   Message Classifications

1.1.2.1.  Debug Messages
1.1.2.2.  Inform Messages
1.1.2.3.  Warn Messages
1.1.2.4.  Alarm Messages
1.1.2.5.  Error Codes

Log messages are used in various ways: to aid in debugging, to report a result was successful, or to indicate a severe error has occurred -- to name a few. This message logger organizes these uses by defining classifications1. The classifications are debug, inform, warn, and alarm.

1.1.2.1.   Debug Messages

These messages are temporary and are generated with the logDebug#() functions. When the application is compiled with NDEBUG defined, the debug messages will not be generated. These types of messages are good for verifying program logic. Once the program has been validated, removing these messages will speed up the application, as well as reduce its footprint.

Debug messages include the file name and line number in the message.

1.1.2.2.   Inform Messages

These messages are generated with the logInform#() functions. Messages of this classification should be used for "feel good" messages. They are feedback indicating the continual health of the front-end. It should be safe to always ignore these type of messages.

1.1.2.3.   Warn Messages

These messages are generated by the logWarn#() functions. This classification is used for messages that indicate a non-fatal error occurred. For instance, if a remote client presents an invalid length or offset to an ACNET device, the appropriate ACNET error code is returned to the client and, possibly, a warning message is sent to the log. Since this is an external error (most likely a database error), it only warrants a warning classification in the front-end.

Although warning messages don't indicate a front-end failure, these messages shouldn't be ignored because they signify that something is misconfigured.

1.1.2.4.   Alarm Messages

Alarm messages are generated with the logAlarm#() functions. Messages of this classification indicate a serious condition has occurred. These messages should never be ignored because they signify the impending failure of the front-end. They can be caused by software bugs, hardware failures, or resource limits.

1.1.2.5.   Error Codes

When designing this facility, I made the assumption that an error can be traced back to a single point of failure. For instance, if there are five steps necessary to complete a process and the process cannot be completed, it is because one of the five steps returned an error. In this case, five different error codes can be used to pinpoint the five possible failure points.

As a real-world example, a front-end needs to validate the length and offset parameters. It would be possible to return an ``invalid argument'' status to signify that the request cannot be completed with the current length and offset. But it is better to have several return codes indicating that the length is bad, or the offset is bad, or combining the length and offset exceeds a buffer limit.

To support this style of programming, the logWarn#() and logAlarm#() functions require an additional argument -- an error code. The hope is that the error code, along with the label (described in ``Message Format''), points to single line of code that explains the reason for the error.

These error codes do not replace the error codes that the rest of the control system understands. These are error codes that are internal to task writers to help them diagnose problems.

1.1.3.   Message Format

The message that is sent consists of a header that is generated by the logger followed by the text of the message.

[SLD,W,6,tShell] Kick starting the SLD -- MEF = 0x0000f000

The header contains four fields. The first field is a tag that is specified in logRegister(). The second field contains the priority, or class, of the message. The third field is a numeric error code specific to the tag and class. The final field is the name of the task that generated the message.

1.1.4.   Writing a client

Like the logging task that ships with VxWorks, this logging task will write the messages to the console. This version, however, will also send the messages to a client across a network. Since it uses IP datagrams as the message carrier, errors can be reported to clients anywhere in the world2.

The client registers itself with the server by sending a datagram. The datagram contains a single byte; the lowest 4 bits representing which messages it wants to receive. The client must periodically send this datagram to remain registered. If the server doesn't receive a datagram within five minutes of the previous one, it removes the client from the notification list.

The intent is to allow these messages to be made available to the system maintainers without having to be constantly logged into the front-end. Another benefit is that the ``header'' of the messages is consistently laid out. By keeping a standardized format, the client programs can be built with some intelligence by reading the machine-generated header. The client can then either filter out uninteresting messages or notify someone for serious problems.

The following program is a sample client. It registers itself every fifteen seconds, so it continues to receive messages. All it does with the message is echo it to its standard output. This simple application is more useful than it appears; the output can be redirected to a file, or it can be piped into grep to filter out uninteresting messages.


Figure 1: Sample client application3

#include <sys/types.h>
#include <sys/time.h>
#include <sys/param.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define	SEC_DELAY	15

#define DOT_TO_IP(a,b,c,d) ((((a) << 24) & 0xff000000) | \
							(((b) << 16) & 0xff0000) | \
							(((c) << 8) & 0xff00) | \
							((d) & 0xff))

/*
  registerMask

  This function sends a datagram to the specified front-end. The
  datagram contains a single byte. The data is used as a bit mask to
  indicate which log message priorities we want reported.
*/
static void registerMask(int s, int addr, int mask)
{
	struct sockaddr_in to;

	memset(&to, sizeof(to), 0);
	to.sin_len = sizeof(to);
	to.sin_family = AF_INET;
	to.sin_addr.s_addr = htonl(addr);
	to.sin_port = htons(7000);

	sendto(s, &mask, sizeof(mask), 0, (struct sockaddr*) &to, sizeof(to));
}

static void logClient(int const* addr, int mask, int n)
{
	time_t base = time(0);
	struct sockaddr_in from;
	unsigned short sendPort = 0;
	int fromSize;
	int s, ii;

	memset(&from, sizeof(from), 0);
	from.sin_len = sizeof(from);
	from.sin_family = AF_INET;
	from.sin_addr.s_addr = htonl(INADDR_ANY);
	from.sin_port = htons(0);

	if (-1 == (s = socket(AF_INET, SOCK_DGRAM, 0))) {
		printf("error creating socket.\n");
		return;
	}
	if (-1 == bind(s, (struct sockaddr*) &from, sizeof(from))) {
		printf("couldn't bind to port.\n");
		return;
	}
	fromSize = sizeof(from);
	getsockname(s, (struct sockaddr*) &from, &fromSize);
	printf("Using port %d.\n", sendPort = ntohs(from.sin_port));

	for (ii = 0; ii < n; ++ii)
		registerMask(s, addr[ii], mask);

	while (1) {
		struct timeval tv;
		int const delta = time(0) - base;
		fd_set fdset;
		int fromSize = sizeof(from), ii;
		unsigned long remoteAddr;
		static char buf[4096];

		FD_ZERO(&fdset);
		FD_SET(s, &fdset);

		tv.tv_sec = delta >= SEC_DELAY ? 0 : SEC_DELAY - delta;
		tv.tv_usec = 0;
		switch (select(s + 1, &fdset, 0, 0, &tv)) {
		 case 0:
		 	base = time(0);
			for (ii = 0; ii < n; ++ii)
				registerMask(s, addr[ii], mask);
		 	break;

		 case 1:
			recvfrom(s, &buf, sizeof(buf), 0, (struct sockaddr*) &from,
					 &fromSize);
			remoteAddr = ntohl(from.sin_addr.s_addr);
			printf("(%03d.%03d.%03d.%03d)%s\n", (remoteAddr >> 24) & 0xff,
				   (remoteAddr >> 16) & 0xff, (remoteAddr >> 8) & 0xff,
				   remoteAddr & 0xff, buf);
			break;

		 default:
			printf("select() returned an error -- exitting.\n");
		 	return;
		}
	}
}

int main(int argc, char* argv[])
{
	if (argc > 1) {
		int const mask = atoi(argv[1]);
		static int const addr[] = {
			DOT_TO_IP(131,225,123,178),
			DOT_TO_IP(131,225,125,58),
			DOT_TO_IP(192,41,211,165),
			DOT_TO_IP(131,225,124,165),
			DOT_TO_IP(131,225,124,166),
		};

		setvbuf(stdout, 0, _IOLBF, 0);
		logClient(addr, mask, sizeof(addr) / sizeof(*addr));
	} else
		printf("usage: %s mask\n", argv[0]);
	return 0;
}

1.2.   Using it on VxWorks

1.2.1.  Initializing the module
1.2.2.  Registering with the logger

This program is provided as a loadable module. Simply load it from the standard directory in which modules are stored. Then, start up the module. As an example, in my VxWorks start up script, I include the following:

ld < vxworks_boot/module/PPC603/errlog.o
errlogInit();

68K users should change PPC603 to MC68020 or MC68040 in order to load the 68K version. This module is intended to be a replacement to the logging facility that is included with VxWorks. You can slightly reduce your kernel size by building a VxWorks kernel without logging support.

Before the module can be used, it must be initialized. This is done by calling errlogInit. Analogously, before the module can be unloaded, the logTerm function should be called. The function prototypes are in the header file errlogLib.h. The header file is found in the directory /home/ad_projects/local/include/.See the Function Reference section for all of the functions that are available to an application.

1.2.1.   Initializing the module

Before the logging facilities can be used, resources need to be allocated and a secondary task needs to be started. These details are handled by errlogInit, hence, this function should be called before any messages are generated. I call this function immediately after loading the module (in the start-up script.)

1.2.2.   Registering with the logger

To send messages to the logger, you need to specify a "handle". You get a handle by calling logRegister and passing it a label. The label is simply a string that identifies the software subsystem that will be generating the messages.

2.   Function Reference

2.1.  errlogInit
2.2.  logAlarm
2.3.  logClientAdd
2.4.  logDebug
2.5.  logInform
2.6.  logOff
2.7.  logOn
2.8.  logRegister
2.9.  logStats
2.10.  logTerm
2.11.  logUnregister
2.12.  logWarn

This section contains descriptions of all the available functions in the module. Since the C language doesn't support function overloading or default arguments, functions that would have had multiple mappings (for varying numbers of arguments) have been given names that differ slightly. In this reference, these functions have been grouped together under one heading.

2.1.   errlogInit

2.1.1.  Synopsis
2.1.2.  Description

2.1.1.   Synopsis

void errlogInit(void);

2.1.2.   Description

This function should be called before any other routines are used. This function sets up the resources used by the module and starts the secondary task.

2.2.   logAlarm

2.2.1.  Synopsis
2.2.2.  Description

2.2.1.   Synopsis

void logAlarm0(HLOG log, int errorCode, char const* fmtTxt);
void logAlarm1(HLOG log, int errorCode, char const* fmtTxt, int arg1);
void logAlarm2(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2);
void logAlarm3(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3);
void logAlarm4(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4);
void logAlarm5(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5);
void logAlarm6(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5, int arg6);

2.2.2.   Description

These functions queue a message using alarm priority. Alarm priority indicates that something severe has occurred in the front-end. This could indicate an unexpected and invalid condition (due to a software bug, for instance) or it could indicate that a resource is exhausted. If an alarm priority message is displayed, it should mean that something in the front-end needs to be fixed.

The errorCode parameter holds a task-specific error code. The code is defined by the task author and should be unique enough that it can indicate an exact circumstance that has occurred. For alarm priority, the error code should indicate the exact point of failure of the front-end.

The remaining arguments are similar to printf (although these function don't have a variable argument list.)

The message is placed in a pipe and eventually gets reported by the log task. Because the arguments are processed at a later time, the arguments passed to these functions should be constant. Simple data types are fine since they're passed by value, but character arrays are a special case. If a buffer whose contents are built up is passed, there is a chance that the buffer contents will change before it gets processed by the log task.

2.3.   logClientAdd

2.3.1.  Synopsis
2.3.2.  Description

2.3.1.   Synopsis

void logClientAdd(char const* name, int mask);

2.3.2.   Description

This function adds a client to the list of message recipients. This is useful if you want to force a client into the list. One use would be to add a client when a node boots. By doing this, the client (assuming it's running) can capture any messages generated by the initialization code. The client is assumed to be listening on port 7000.

The client's name is passed as a string. In order to resolve the name, either the node needs to be using DNS, or the client's name and address should have been already added to the local host table using hostAdd().

The mask parameter indicates which class of messages the client wishes to receive. The mask is generated by logically ORing the following masks:

Table 1: Priority Masks

Mask Value Description
0x01 Client wishes to receive DEBUG priority messages.
0x02 Client wishes to receive INFORMATIVE priority messages.
0x04 Client wishes to receive WARNING priority messages.
0x08 Client wishes to receive ERROR priority messages.

2.4.   logDebug

2.4.1.  Synopsis
2.4.2.  Description

2.4.1.   Synopsis

void logDebug0(HLOG log, char const* fmtTxt);
void logDebug1(HLOG log, char const* fmtTxt, int arg1);
void logDebug2(HLOG log, char const* fmtTxt, int arg1, int arg2);
void logDebug3(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3);
void logDebug4(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4);
void logDebug5(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5);
void logDebug6(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5, int arg6);

2.4.2.   Description

These functions queue a message using debug priority. Debug messages should only be used for development. If the symbol NDEBUG isn't defined, these messages will be included. If NDEBUG is defined, then the messages aren't compiled into the executable. This follows the enabling/disabling logic of the assert() macro, so the debug messages can be used along with the assert() macro in debugging your application.

The arguments are similar to printf (although these function don't have a variable argument list.)

The message is placed in a pipe and eventually gets reported by the log task. Because the arguments are processed at a later time, the arguments passed to these functions should be constant. Simple data types are fine since they're passed by value, but character arrays are a special case. If a buffer whose contents are built up is passed, there is a chance that the buffer contents will change before it gets processed by the log task.

2.5.   logInform

2.5.1.  Synopsis
2.5.2.  Description

2.5.1.   Synopsis

void logInform0(HLOG log, char const* fmtTxt);
void logInform1(HLOG log, char const* fmtTxt, int arg1);
void logInform2(HLOG log, char const* fmtTxt, int arg1, int arg2);
void logInform3(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3);
void logInform4(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4);
void logInform5(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5);
void logInform6(HLOG log, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5, int arg6);

2.5.2.   Description

These functions queue a message using inform priority. Inform messages are ``feel good'' messages. They don't indicate a front-end failure or input error. They simply report information that the application author may find interesting. Startup banners and progress indicators should be placed in this category.

The arguments are similar to printf (although these function don't have a variable argument list.)

The message is placed in a pipe and eventually gets reported by the log task. Because the arguments are processed at a later time, the arguments passed to these functions should be constant. Simple data types are fine since they're passed by value, but character arrays are a special case. If a buffer whose contents are built up is passed, there is a chance that the buffer contents will change before it gets processed by the log task.

2.6.   logOff

2.6.1.  Synopsis
2.6.2.  Description

2.6.1.   Synopsis

void logOff(void);

2.6.2.   Description

This function shuts off error reporting to the console. This can be very useful if some task is cranking out messages, but you want to interact with the shell.

By default, messages are sent to the console.

2.7.   logOn

2.7.1.  Synopsis
2.7.2.  Description

2.7.1.   Synopsis

void logOn(void);

2.7.2.   Description

This function restores the reporting of errors to the console. By default, errors are sent to the console.

2.8.   logRegister

2.8.1.  Synopsis
2.8.2.  Description

2.8.1.   Synopsis

HLOG logRegister(char const* label, int blk);

2.8.2.   Description

All messages written to the log must be done through a handle, and this handle is obtained by calling this function. When asking for a handle, a label is specified. This label is placed in the message header whenever the handle is used. The label typically describes a subsystem. The source files that make up the subsystem will share a log handle.

When an interrupt routine writes a log message, it can never block. If the message queue is full when an interrupt tries to write to it, an internal ``skipped'' counter is incremented and the number of skipped messages is eventually reported. Tasks, on the other hand, may or may not block. It depends upon the importance of the task. The second paramter to this function, blk, indicates whether a task using this handle should block when the message queue is full. If it is 0 (false), tasks will exhibit the same behavior to a full queue as an interrupt routine.

This function returns a valid log handle if successful or NULL if an error occurred.

2.9.   logStats

2.9.1.  Synopsis
2.9.2.  Description

2.9.1.   Synopsis

void logStats(void);

2.9.2.   Description

Displays, to the console only, logger statistics. The number of messages that can be pending is displayed. Next, the maximum number of pending messages that has ever been queued is displayed. Last, the current number of pending messages is reported.

2.10.   logTerm

2.10.1.  Synopsis
2.10.2.  Description

2.10.1.   Synopsis

void logTerm(void);

2.10.2.   Description

This function should be called before the module is unloaded. It gracefully shuts down the secondary task and frees the resources used by the module. If you unload the module without calling this function, your system will be left in an unstable state.

2.11.   logUnregister

2.11.1.  Synopsis
2.11.2.  Description

2.11.1.   Synopsis

void logUnregister(HLOG* log);

2.11.2.   Description

When an application is through using a handle to the logging facilities, the resources used by the handle can be reclaimed by calling this function. Once this function completes, the handle is no longer valid. To enforce this, the log argument is passed by reference. The function will invalidate the contents of the handle which will result in a data access error anytime the handle is used.

2.12.   logWarn

2.12.1.  Synopsis
2.12.2.  Description

2.12.1.   Synopsis

void logWarn0(HLOG log, int errorCode, char const* fmtTxt);
void logWarn1(HLOG log, int errorCode, char const* fmtTxt, int arg1);
void logWarn2(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2);
void logWarn3(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3);
void logWarn4(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4);
void logWarn5(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5);
void logWarn6(HLOG log, int errorCode, char const* fmtTxt, int arg1, int arg2, int arg3, int arg4, int arg5, int arg6);

2.12.2.   Description

These functions queue a message using warning priority. Warning priority indicates that the front-end detected an error from an external source. This could be due to invalid network requests. This priority shouldn't be used for conditions that indicate a front-end failure. Rather, it can be used to diagnose external problems. For instance, if an application asks for data using an invalid length, the front-end, in addition to returning an error to the application, may display a warning indicating it was asked to do something illegal.

The errorCode parameter holds a task-specific error code. The code is defined by the task author and should be unique enough that it can indicate an exact circumstance that has occurred. For warning priority, the error code should indicate the exact point of failure of the front-end.

The remaining arguments are similar to printf (although these function don't have a variable argument list.)

The message is placed in a pipe and eventually gets reported by the log task. Because the arguments are processed at a later time, the arguments passed to these functions should be constant. Simple data types are fine since they're passed by value, but character arrays are a special case. If a buffer whose contents are built up is passed, there is a chance that the buffer contents will change before it gets processed by the log task.

3.   The Implementation Details

The source code and documentation files have been placed under cvs control. The module name is errlog. Further implementation details will be added later.

Notes

1.

Some of the documentation refers to the organization as "priorities". Since messages don't get reported before others based upon a priority scheme, this is a misnomer. I'll try to get the documentation to be consistent.

2.

Unless, of course, either machine is protected by a firewall.

3.

This application was tested on FreeBSD. It should compile and run on other Unix platforms with little or no change.

Security, Privacy, Legal