Feb 2015

MySQL Status Check in Nagios 2

Part one of this series introduced the concept of using a web status page for nagios checks and how to setup a mysql and php status page. In part two the nagios check itself is detailed along with what other interesting things could one do.

The Nagios Check Script

For simplification I will follow what seems to be the systhread standard and first break the script down into pieces and at the end put it all together. First setting things up:

#!/bin/sh
HOST2CHK=$1                          # 1. The IP Address of the web server
URI=http://$HOST2CHK/status/; # 2. The URI string
statfile=/var/tmp/$0.$$              # 3. A local status file
  1. Nagios uses IP addresses and not Domain or Host names to connect to systems. Do not hard code this, make the check reusable.
  2. The full URI string. For our example it is one sub directory on the web server.
  3. A local status file where we keep the pulled results and parse them for the Nagios server.

The next to do is pulling the status page into a local copy using the dump mode of the lynx web browser:

lynx --dump $URI|grep MYSQL>$statfile

Interestingly in my script the next block is actually the status exit function. I put this after the lynx dump figuring if the web page pull failed then what is the point?

statexit()
{
        retval=$1            # 1. return value for nagios determined by caller

        tail -n 1 $statfile  # 2. Nagios message line from the statfile
        rm -f $statfile      # 3. Delete the temporary status file
        exit $retval         # 4. Exit with numerical status for Nagios
}
  1. The exit value from the script is actually used by Nagios to set the condition level of an alert. The caller must provide this. The logic for figuring it out follows.
  2. Tail the Nagios status line which is the last line of the status file. It is in the form of SERVICE_NAME STATUS_NAME for HOST_OR_SUBINFO .
  3. Delete the temporary status file.
  4. Exit the script with the return value for Nagios.

Finally the logic which figures out what the status is and calls the status exit routine. Note that a count of expression match is used to supress output instead of redeirecting it to null and checking $?; there are any number of ways this can be done, following is one of them:

cnt=`grep OK $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 0
fi

cnt=`grep WARN $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 1;
fi

cnt=`grep CRIT $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 2;
fi

cnt=`grep unable $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 2;
fi

The above is actually the simplest part of the logic. If we see good, bad or ugly send off the message to Nagios.

The Whole Thing

#!/bin/sh
HOST2CHK=$1                          # 1. The IP Address of the web server
URI=http://$HOST2CHK/status/; # 2. The URI string
statfile=/var/tmp/$0.$$              # 3. A local status file

lynx --dump $URI|grep MYSQL>$statfile

statexit()
{
        retval=$1            # 1. return value for nagios determined by caller

        tail -n 1 $statfile  # 2. Nagios message line from the statfile
        rm -f $statfile      # 3. Delete the temporary status file
        exit $retval         # 4. Exit with numerical status for Nagios
{

cnt=`grep OK $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 0
fi

cnt=`grep WARN $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 1;
fi

cnt=`grep CRIT $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 2;
fi

cnt=`grep unable $statfile|wc -l`;
if [ $cnt -eq 1 ]; then
        statexit 2;
fi

Adding the Check

Assuming you have a $USER2$ (or something similar) in your Nagios configuration for local checks (DO NOT dump local scripts into the Nagios libexec area) then adding the command definition should look something like this:

define command{
        command_name    check_mysql_status
        command_line    $USER2$/check_mysql_status $HOSTADDRESS$
        }

And now you are off to the races, a new service check and done:

define service{
    use myservice
    hostgroup_name mysql-status-page-servers
    service_description  MySQL Status Page Check
    check_command check_mysql_status
    }

In the above we use hostgroups, your configuration of course may differ.

Summary & Improvements

There are a lot of different ways this can be approached. The examples shown here are only one. For instance, the lynx call can be parsed on the fly. The pattern search can be done differently. Or, even the simplest, just tail the lynx dump and use awk to format a straight return. There are probably about fifty other ways to do it as well. The logic behind the way I did it was, at the time, I wanted temp files for debugging purposes. Now that it has been working for awhile I will likely go back and redo it.