Mar 2011

Scripts to Autogen Nagios Configurations

Ever been in a situation where you needed to setup configuration files for tens, hundreds or perhaps thousands of systems? Probably. And if not and your a sysadmin, you probably will at some point. This text details how in one small situation I saved myself soem time later on by writing two very small scripts.

The Scenario

The details are relatively simple. A new High Performance Cluster (HPC) going into place needs to be monitored. The tool being used is Nagios. The goal of this particular monitoring instance was to setup service checks using hostgroups to help minimize the maintenance overhead. The problem of course is that the potential to add N compute nodes was (and still is) a reality. Before even bothering to add any nodes by hand I decided to script auto generating what I needed as much as possible. In the end the two small scripts I wrote did the following:

  • Generated Host entries
  • Generated a single compute node hostgroup
  • Generated Service Checks
  • Generated hostgroups by queue

The first script generates host entries and the compute node hostgroup. The second script automatically creates hostgroups by queue. The second script also makes calls to Sun Grid Engine (SGE) to get information regarding queues.

These scripts are purposefully left as close to the originals to allow readers to go in and make improvements. They could use a lot of improvement which of course might make for a good article later on. But they do demonstrate that even if a script can use a lot of improvement at the time they got the job done. And quite well I might add.

Script One: Host Entries & The Node Hostgroup

#!/usr/bin/perl
# This Creates the basic nagios config files for cluster nodes
$end   = 16;
$network = "192.168.0";

print "\# RCHPC1 AUTOGENERATED CONFIG\n";
print "\# DO NOT EDIT THIS DIRECTLY! See the mkclusterconf script!\n";

$x = 11;
$members = "";
$dyn = "";
$liq ="";
for ($i = 1; $i <= $end; $i++) {

    if ($i <= 9) {
        $hostname = "n00$i";
    } else {
        $hostname = "n0$i";
    }

    if ($i == 1) {
        $members = "n001";
    } else {
        $members = "$members,$hostname";
    }

    print "define host\{\n";
    print "  use        linux-server\n";
    print "  host_name  $hostname\n";
    print "  alias      $hostname.node\n";
    print "  notes      RC HPC Compute Node\n";
    print "  address    $network.$x\n";
    print "  \}\n\n";
    $x++;
}

print "define hostgroup\{\n";
print "  hostgroup_name myhpc1-compute-nodes\n";
print "  alias          All HPC Systems\n";
print "  members        $members,myhpc-prime\n";
print "  \}\n\n";

print "define service\{\n";
print "  use generic-service\n";
print "  hostgroup_name myhpc1-compute-nodes\n";
print "  service_description SSH\n";
print "  check_command check_ssh\n";
print "  \}\n\n";

print "define service\{\n";
print "  use generic-service\n";
print "  hostgroup_name myhpc1-compute-nodes\n";
print "  service_description Current Load\n";
print "  check_command snmp_load\n";
print "  \}\n\n";

print "define service\{\n";
print "  use generic-service\n";
print "  hostgroup_name myhpc1-compute-nodes\n";
print "  service_description RPC\n";
print "  check_command check_rpc_port\n";
print "  \}\n\n";

Script Two: Hostgroups by Queue

#!/bin/sh

echo "define hostgroup{"
echo "  hostgroup_name hpc-infrastructure"
echo "  alias HPC Infrastructure"
echo "  members myhpc-prime"
echo "  }"

hostlist=`qconf -shgrp @allhosts|grep hostlist`
echo "define hostgroup{"
echo "  hostgroup_name all-queue"
echo "  alias SGE Queue ALL"
echo -n "  members "
for n in $hostlist
do
    if [ $n != "hostlist" ]; then
        echo -n "${n},"
    fi
done
echo " "
echo "  }"

hostlist=`qconf -shgrp @dynhosts|grep hostlist`
echo "define hostgroup{"
echo "  hostgroup_name dyn-queue"
echo "  alias SGE Queue Dyna"
echo -n "  members "
for n in $hostlist
do
    if [ $n != "hostlist" ]; then
        echo -n "${n},"
    fi
done
echo " "
echo "  }"

hostlist=`qconf -shgrp @liqhosts|grep hostlist`
echo "define hostgroup{"
echo "  hostgroup_name liq-queue"
echo "  alias SGE Queue Liq"
echo -n "  members "
for n in $hostlist
do
    if [ $n != "hostlist" ]; then
        echo -n "${n},"
    fi
done
echo " "
echo "  }"

Executing the scripts is pretty simple:

./mkclusterconf.pl > myhpc.cfg
./mkquegrps.sh >> myhpc.cfg

Room for Improvement

  1. The queue group names can actually be dynamiclly generated by asking SGE to print out the names
  2. The range of hosts in mkclusterconf.pl can be done dynamically by calling SGE
  3. There really is no reason why these have to be two scripts. In truth the groups script was added on after the fact (see also: the author is lazy)
  4. An rc file of sorts could alternately be used to auto generate the checks (or add/take away checks) for the entire range of compute nodes
  5. Probably more ... just can't think of any at the moment ...

Summary

Laziness is a virtue for a sysadmin. Even being so lazy as to generate configuration scripts without the muss and fuss of having to yank and put a lot of lines.