Detailed Error Handling In Bash

by Willem Bogaerts, application smith at Kratz Business Solutions

Summary

Shell scripts are often running as background processes, doing useful things without running in a visible shell. Think, for example, of cron jobs or scripts that are fired from a program on a web server. To write such scripts can be quite painful, as all errors occur out of sight as well. Off course you can make use of a log file, but the ideal level of logging is hard to find. You often log way too much when the script is running fine and way too little when it unexpectedly fails. While log files can hold a lot of information, finding the relevant information is a bit trickier.

My solution is to log only the errors with all the details to a small database. This database contains tables for the message, the corresponding stack trace and the important environment variables. I have chosen for an SQLite database in this howto, but the same principle works with other databases as well.

The database

SQLite needs some settings to work as I expect it to, and these settings can be put in an initializing script. These settings include the error behaviour of SQLite itself and its foreign key handling:

.bail ON
.echo OFF
PRAGMA foreign_keys = TRUE;

Off course, we also need a database and I do not want to rely on one to exist. Therefore, the first thing the bash script will do is to run an SQL "revive" script on the database file: if the database did not exist, it will be created and if it did, it will do nothing:

CREATE TABLE IF NOT EXISTS ErrorLog
      (intErrorLogId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       strMessage TEXT NOT NULL,
       tstOccurredAt DATE NOT NULL DEFAULT(CURRENT_TIMESTAMP) );
CREATE INDEX IF NOT EXISTS idxELOccurredAt ON ErrorLog(tstOccurredAt);

CREATE TABLE IF NOT EXISTS ErrorStackTrace
      (intErrorStackTraceId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       intErrorLogId INTEGER NOT NULL,
       strSourceFile TEXT NOT NULL,
       strFunction TEXT NOT NULL,
       intLine INTEGER NOT NULL,
       FOREIGN KEY(intErrorLogId) REFERENCES ErrorLog(intErrorLogId)
               ON DELETE CASCADE
               ON UPDATE CASCADE );
CREATE INDEX IF NOT EXISTS idxESTTraceErrorLogId ON ErrorStackTrace(intErrorLogId);

CREATE TABLE IF NOT EXISTS ErrorEnvironment
      (intErrorEnvironmentId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       intErrorLogId INTEGER NOT NULL,
       strVariable TEXT NOT NULL,
       strValue TEXT NULL,
       FOREIGN KEY(intErrorLogId) REFERENCES ErrorLog(intErrorLogId)
               ON DELETE CASCADE
               ON UPDATE CASCADE );
CREATE INDEX IF NOT EXISTS idxEEErrorLogId ON ErrorEnvironment(intErrorLogId);

The mechanism

Now we have the database defined, we can focus on the error handler itself. Bash has a trap command that, among other things, can be used to trap non-zero exit codes of commands. These non-zero exit codes usually denote an error. Within a "trapped" section of a script, you can still react to a non-zero exit code without the error handler to take over by using the "or construct":

false || echo "non-zero exit"

The above line will not invoke the error handler, but any of the following lines will (the command false will always exit with a non-zero exit code, the command true will exit with an exit code of zero):

false
false;true

If we define a function that will log all error data, we can pass that function to the trap command. From that function, we can access the call stack by invoking the caller command. The caller command will return an array containing the line number, the subroutine name and the file name.

I will read the environment from a predefined list of variables that I am interested in, because my (Xubuntu) system has so many pre-defined variables that logging them all would drown me in useless information. I am also interested in the messages that are written to STDERR, so I will define a file location that I can use to send the STDERR stream to. After logging the error, I will still write the contents of that file to STDERR, so the user is not left in the dark with scripts that do run interactively. Now I have all the information I want to log.

Caveat

The commands logged are the commands at the level of the trap command. This means that if you put a trap statement in your main script and call a function, that function will complete, and only after its completion, the error handler may be invoked if the function ended in an error state. If you want error handling within the function as well, put the trap command also inside it.

Invoking the error handler

To make my error handling script available as a kind of library, I will call it from the scripts I want to monitor, with the source command. This will effectively include it in the scripts I want to monitor. The calling scripts need to define a few settings before including the error handling script:

ERROR_CLEANUP_ACTION: Optional. A command to be executed after the error handler has run, for example to clean up temporary files.
ERROR_ENVIRONMENT_VARIABLES: A list of variables to log. This list will be augmented with EXITCODE BASH_COMMAND BASH_LINENO BASH_ARGV
ERRORDB: The location of the database. The database file does not need to exist, but the location does need to be writeable by the user running the script.
ERROROUTPUT: The location of a file to catch the contents of STDERR. Optional. Defaults to "/var/tmp/$$.err".
SQLITE3_EXECUTABLE: The location of the SQLite3 command line client. Optional. Defaults to `which sqlite3`.

Because escaping all possible error messages for an INSERT query can be quite a challenge, I define two auxiliary functions Error_Hexit and Error_Hexit_File. This allows me to "escape" all exotic strings to hexadecimal strings. SQLite interprets all hexadecimal strings as binary objects, so I cast them to TEXT in my INSERT statements.

The full script

After putting it all together, I have this script that I can include:

#!/bin/bash
# Needed variable: ERRORDB
# Optional variables: ERROROUTPUT, SQLITE3_EXECUTABLE, ERROR_CLEANUP_ACTION, ERROR_ENVIRONMENT_VARIABLES

if [ -z $ERROROUTPUT ];then
   ERROROUTPUT=/var/tmp/$$.err
fi
if [ -e $SQLITE3_EXECUTABLE ] ;then
   SQLITE3_EXECUTABLE=`which sqlite3`
fi
# The settings file for SQLite3:
ERROR_INIT_SQLITE=/var/tmp/$$_init.sql
# For convenience, a function to create the settings for SQLite3:
function Error_Create_Init_File
        {
         cat > $ERROR_INIT_SQLITE <<'ERROR_SQLITE_SETTINGS'
.bail ON
.echo OFF
PRAGMA foreign_keys = TRUE;
ERROR_SQLITE_SETTINGS
        }
# Create the database:
if [ -z $ERRORDB ] ;then
   echo "Error database is not defined."
   exit 1
else
   Error_Create_Init_File
   $SQLITE3_EXECUTABLE -batch -init $ERROR_INIT_SQLITE $ERRORDB <<'ERROR_TABLE_DEFINITION'
CREATE TABLE IF NOT EXISTS ErrorLog
      (intErrorLogId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       strMessage TEXT NOT NULL,
       tstOccurredAt DATE NOT NULL DEFAULT(CURRENT_TIMESTAMP) );
CREATE INDEX IF NOT EXISTS idxELOccurredAt ON ErrorLog(tstOccurredAt);

CREATE TABLE IF NOT EXISTS ErrorStackTrace
      (intErrorStackTraceId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       intErrorLogId INTEGER NOT NULL,
       strSourceFile TEXT NOT NULL,
       strFunction TEXT NOT NULL,
       intLine INTEGER NOT NULL,
       FOREIGN KEY(intErrorLogId) REFERENCES ErrorLog(intErrorLogId)
               ON DELETE CASCADE
               ON UPDATE CASCADE );
CREATE INDEX IF NOT EXISTS idxESTTraceErrorLogId ON ErrorStackTrace(intErrorLogId);

CREATE TABLE IF NOT EXISTS ErrorEnvironment
      (intErrorEnvironmentId INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
       intErrorLogId INTEGER NOT NULL,
       strVariable TEXT NOT NULL,
       strValue TEXT NULL,
       FOREIGN KEY(intErrorLogId) REFERENCES ErrorLog(intErrorLogId)
               ON DELETE CASCADE
               ON UPDATE CASCADE );
CREATE INDEX IF NOT EXISTS idxEEErrorLogId ON ErrorEnvironment(intErrorLogId);
ERROR_TABLE_DEFINITION
   rm -f $ERROR_INIT_SQLITE
fi
# Helper functions to "escape" strings tohexadecimal strings
function Error_Hexit # StringToHex
        {
         echo -n "$1" | hexdump -v -e '1 1 "%02X"'
        }

function Error_Hexit_File # FileToHex
        {
         if [ -e $1 -a -s $1 ] ;then
            hexdump -v -e '1 1 "%02X"' < $1
         else
            Error_Hexit '(No message)'
         fi
        }
# The error handling function:
# Enable with: trap Error_Handler ERR
# Disable with: trap '' ERR
function Error_Handler
        {
         local EXITCODE=$?
         trap '' ERR # switch off error handling to prevent wild recursion.
         local ARRAY=( `caller 0` )
         Error_Create_Init_File
         # Write the error message (from STDERR) and read the generated autonumber:
         local INSERT_ID=`$SQLITE3_EXECUTABLE -batch -init $ERROR_INIT_SQLITE $ERRORDB "INSERT INTO ErrorLog(strMessage) VALUES(CAST(x'$(Error_Hexit_File $ERROROUTPUT)' AS TEXT));SELECT last_insert_rowid();"`
         # Write the stack trace:
         local STACKLEVEL=0
         local STACK_ENTRY=`caller $STACKLEVEL`
         until [ -z "$STACK_ENTRY" ];do
               local STACK_ARRAY=( $STACK_ENTRY )
               $SQLITE3_EXECUTABLE -batch -init $ERROR_INIT_SQLITE $ERRORDB "INSERT INTO ErrorStackTrace(intErrorLogId,strSourceFile,strFunction,intLine) VALUES($INSERT_ID, CAST(x'$(Error_Hexit ${STACK_ARRAY[2]})' AS TEXT), '${STACK_ARRAY[1]}', ${STACK_ARRAY[0]})"
               let STACKLEVEL+=1
               STACK_ENTRY=`caller $STACKLEVEL`
         done
         # Write the error environment:
         for VAR in EXITCODE BASH_COMMAND BASH_LINENO BASH_ARGV $ERROR_ENVIRONMENT_VARIABLES ;do
             local CONTENT=$(Error_Hexit "${!VAR}")
             $SQLITE3_EXECUTABLE -batch -init $ERROR_INIT_SQLITE $ERRORDB "INSERT INTO ErrorEnvironment(intErrorLogId,strVariable,strValue) VALUES($INSERT_ID, '$VAR', CAST(x'$CONTENT' AS TEXT));"
         done
         # Clean up and provide feedback:
         if [ -e $ERROROUTPUT ] ;then
            cat $ERROROUTPUT 1>&2
         fi
         rm -f $ERROR_INIT_SQLITE
         rm -f $ERROROUTPUT
         if [ -n "$ERROR_CLEANUP_ACTION" ] ;then
            $ERROR_CLEANUP_ACTION
         fi
         exit $EXITCODE
        }

Usage example

#!/bin/bash

# Find out where I am to load the library from the same directory:
HERE=`dirname $0`

# Error handling settings:
ERRORDB=~/test.sqlite
ERROR_ENVIRONMENT_VARIABLES='USER TERM PATH HOSTNAME LANG DISPLAY NOTEXIST'
ERROR_CLEANUP_ACTION="echo I'm cleaning up!"

source $HERE/liberrorhandler.bash
trap Error_Handler ERR
# The above trap statement will do nothing in this example,
# unless you comment out the other trap statement.

function InnerFunction
        {
         trap Error_Handler ERR
         # The above trap statement will cause the error handler to be called.
         cat $1 2> $ERROROUTPUT
         # This will fail, because the file passed in $1 does not exist.
        }

function OuterFunction
        {
         InnerFunction /doesnot.exist
        }

OuterFunction 2>$ERROROUTPUT