Apr 2007

Options Parsing

Options parsing can be difficult at times to say the least. There exist a number of common methods and libraries to assist with options parsing. In this text, a look at writing option and argument parsing homespun and with a little help.

Simple Parsing in sh

Simple parsing is easy in the shell:

while [ "$#" -gt "0" ]
do
        case $1 in
                -F)
                        F_FLAG=1
                        ;;
                -f)
                        shift
                        FILE_ARGUMENET=$1
                        ;;
                -u)
                        Usage
                        exit 0
                        ;;
                *)
                        echo "Syntax Error"
                        Usage
                        exit 1
                        ;;
        esac
        shift
done

Above, the input string is iterated over and particular options act or assign a variable. The posix getopt capability allows for built in - parsing:

while getopts ":f:Fu" opt; do
        case $opt in
                F) F_FLAG=1;;
                f) FILE_ARGUMENT=$OPTARG;;
                u) usage;;
                *) usage
                        exit 1
                        ;;
        esac
        shift
done

A colon after an option indicates it requires an argument. The getopt code is far more compact than the first example. What if the script requires long options? One approach is simply to hard code long options:

while [ "$#" -gt "0" ]
do
    case $1 in
        -F|--setflag)
            F_FLAG=1
            ;;
        -f|--file)
            shift
            FILE_ARGUMENET=$1
            ;;
        -u|--usage)
            Usage
            exit 0
            ;;
        *)
            echo "Syntax Error"
            Usage
            exit 1
            ;;
    esac
    shift
done

Setting up long options appears to be simple, however, it can quickly get out of control using the method show above. Instead, writing code to handle long options that can either be sourced in or easily dropped into scripts makes far more sense. Grigoriy Strokin has a good script that can either be copied in or sourced and can be found on his website. Following is the same code from above using getoptex:

. getoptx.sh
while getoptex "F; f; u. setflag file usage." "$@"; do
        F) F_FLAG=1;;
        f) FILE_ARGUMENT=$OPTARG;;
        u) usage;;
        *) usage
                exit 1
                ;;
done

It is pretty obvious that the single character is mapped to the the long option past the first . and the full terminator is the second dot. Of course, there is an even easier method as long as a few rules are observed:

while [ "$#" -gt "0" ]
do
        opt="${1//-}"
        opt=$(echo "${opt}" | cut -c 1 2>/dev/null)
        case $opt in
                F) F_FLAG=1;;
                f) shift;FILE_ARGUMENT=$1;;
                u) usage;;
                *) usage; exit 1;;
        esac
        shift
done

The problem with the last method is the long options are not hard-coded, the first character of the alpha string is cut and used as an option. In other words, --help and --heck will do the same thing. The idea is harmless except no options can be mixed and matched. Generally speaking, not having a --help and --heck valid in the same script or program should be avoided if possible.

Options in Perl

With no case statement built in, doing options parsing in Perl can be a little tricky. Using the same example from the shell code above a simple options parser might look like: [ 1 ]

while ( my $arg = shift @ARGV ) {
    if ( $arg eq '-F' ) {
      $F_FLAG = 1;
    } elsif ( $arg eq '-f' ) {
      $FILE_ARGUMENT = shift @ARGV;
    } elsif ( $arg eq '-u' ) {
      usage();
    } else {
      usage();
      exit 1;
    }
 }

Relative to the shell, Perl seems a bit heavy handed in the amount of work needed. In Perl the options for handling are almost limitless. Associative arrays, hashes, arrays or just plain scalars arranged a certain way could be used.

Of course, another great thing about Perl is how simplistic string operations are handled. Using a method similar to the last shell method above can simplify the code a great deal:

for (my $argc = 0; $argc <= @ARGV; $argc++) {
        $opt = $ARGV[$argc];
        $opt =~ s/--//; # Get rid of 2 dashes
        $opt =~ s/-//; # Get rid of 1 dash
        $opt =  substr($opt,0,1); # cut the first char
        if ($opt eq 'F') {
                $F_FLAG=1;
        } elsif ($opt eq 'f') {
                $FILE_ARGUMENT=$ARGV[++$argc];
        } elsif ($opt eq 'u') {
                usage();
        } else {
                usage();
                exit 1;
        }
}

Of course, the same two problems from the shell-code which cuts out the first alphanumeric exists; no two long options can start with the same letter and there is no verification of long options. Not unlike the shell, a simple list can be used to verify that long options are valid, following is an example sub routine:

...
my @valid_optlongs=("setflag", "file", "usage");
my @valid_optshort=("F",       "f",    "u");
...
sub parseopt{
        my ($opt) = shift;

        $opt =~ s/--//; # Get rid of 2 dashes
        $opt =~ s/-//; # Get rid of 1 dash

        if (scalar($opt) > 1) { 
                for ($i = 0; $i < @valid_optlongs; $i++) {
                        if ($opt eq $valid_optlongs[$i]) {
                                return $valid_optshort[$i];
                        }
                }
        } else {
                return $opt;
        }
}

Essentially instead of just trimming out the first valid alphanumeric, if the option is a long option check it against the list of valid long options and return the matching single byte option the long option correlates to.

Ultimately, using the getopt module should be done if it is available, why reinvent the wheel? Here is an example of using the Getopt module:

use Getopt::Std;
...
getopt ('f:uF');

die "Usage: $0 [ -f filename -u ]\n"
        unless ( $opt_f or $opt_u );

if ($opt_f) {
        my $filename = shift @ARGV;
} elsif ($opt_u) {
        usage();
        exit 0;
}

Definitely shorter and compact.

Parsing Options in C

The oldest high level programming language - not unlike Perl - has many different approaches a programmer can take without using libraries:

int
main(argc, argv)
    int argc;
    char    *argv[];
{
    if (argc < 2) {
        printf("usage: %s number-of-execs sbrk-size job-name\n",
            argv[0]);
        exit(1);
    }
....


int main (argc, argv) {
        for (c = 0; c <=argc; c++) {
                if (argc[c] == 'F') {
                        F_FLAG=1
...

libc offers up two levels of built in options handling, one for single options and one for long options. Since the options handling routines are in modern implementations, the examples will use GNU's version.

Short Options in C

...
#include <getopt.h>
...
int main (int argc, char **argv)
{
int c;
char * file;

while ((c = getopt(argc, argv, "F:f:u:")) != -1) {
        switch (c) {
        case 'F':
                F_FLAG=1
                break;
        case 'f':
                file = optarg;
                break;
        case 'u':
                usage();
                return 0;
                break;
        default:
                usage();
                return 1;
                break;
        }
}

Far more succinct than what may have happened using the previous C examples which would have been pretty spaghetti'd. Long options are even more interesting. The GNU C library internally handles assignment of long options by using the single alpha as the key inside of a data structure:

...
#include <getopt.>
...
int main(int argc, char **argv)
   while (1)
      {
        static struct option long_options[] =
          {
            {"setflag", no_argument,       0,  'F' },
            {"file",   required_argument,  0,  'f' },
            {"usage",  no_argument,        0,  'u' },
            {0,0,0,0} /* This is a filler for -1 */
          };

        int option_index = 0;

         c = getopt_long (argc, argv, "F:f:u:", long_options, &option_index);

        if (c == -1) break;

        switch (c) {
        case 'F':
                F_FLAG=1;
                break;
        case 'f':
                file = optarg;
                break;
        case 'u':
                usage();
                return 0;
                        break;
        default:
                usage();
                return 1;
                break;
        }
}

Short, sweet and to the point.

Summary

Sometimes parsing can be extremely simple, adding long options and flag setting to the mix can be daunting when writing from the ground up, luckily libraries and modules exist to help along the way.

Footnotes

  1. Special thanks to Matt Mr. Muskrat Musgrove for suggesting showing the Perl Getopt module and writing some nice examples of well formed code. The first example belongs to Matt.

References