Nov 2010

Wrapping a Program Around the Core Data Structure

In Eric Raymond's The Art of Unix Programming, within the section called The Basics of Unix Philosophy there is a rule quoted by Rob Pike:

Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming

At face value Rob's rule number 5 makes sense. But what is Rob actually saying? In complex software systems it might be difficult to track down and identify how the rule of evolving functions to deal with data worked. So why not use a small microscopic example instead. Taking a small program, a passive network scanner, from data structures to operations on the data structures illustrates Rob's rule number 5 perfectly. This is an interesting experience from my perspective as most of the programs and scripts I have written deal with transitionary data. What I mean by transitionary is simply find it, operate on it and/or print it then move on. Not an unusual trait in system administration centric programs. While working on a passive scanner that could also verify a port I witnessed rule number 5 occur right before by fingertips.

As soon as I set about writing a passive scanner I realized I would need data structures. The logic for this is simple, in reality the program is just sniffing packets using libpcap then correlating that data. The pcap engine loops over packets then returns the information. The pcap information however is transitionary. Ironic no? Therefore data of interest must be stored somewhere within the passive scanner for use when the program is finished. Otherwise I would be just reinventing the tcpdump utility in a much smaller form, which I already did. So the very first thing after getting the pcap functions to work (which was done via rapid prototyping) was to sit down and figure out what the data structures needed to look like.

Right off the bat it would seem pretty straightforward to see that the address and a list of discovered ports need to be in the data structure. So the first pass at it was:

struct ipaddr_list {
	char	host_ip_address[IPV4CW];
	int ports[65535];
};

Right off the top we already know some operations our data structure is going to need:

  • Add a new address
  • Add a discovered port to existing address
  • Print addresses

Those are all well and good but another issue immediately came up. There also needed to be information about the ports themselves as well. Specifically:

  • The number of tcp connections
  • The number of udp connections
  • Is the port really running a service?

Now the data structure is starting to flesh out some more:

struct ipaddr_list {
    char host_ip_address[IPV4CW];   /* The ipaddress in string form */
    struct {
        uint16_t portlist[PORTMAX]; /* The port number that was detected     */
        int tcpconns[PORTMAX];  /* # of tcp connects for position (port) */
        int udpconns[PORTMAX];  /* # of udp connects for position (port) */
        int verified;     /* Has this port been verified? */
    } portdata;
    struct ipaddr_list *next_ptr;   /* Next item in ze list */
};
struct ipaddr_list *first_ptr = NULL;   /* All things must have a start */

To verify that a port is running a service the validation flag is set after trying to connect to a port and by seeing if the number of connections threshold has been surpassed. In a sense it is a sort of fuzzy check. The verification does have an over ride but that makes for less accurate results. Regardless, with the final data structure created we now have a fleshed out list of to do items for the program:

  • Add a new address
  • Add a discovered port along with information about the port
  • Search for address then port (this is to avoid duplication)
  • Increment the tcp and/or udp connections counter per port
  • Display the data

Now we have a lot to chomp on. First things first. In order to add a new address it is easier to simply embed that capability inside of adding a port. Why? Easy, search to see if the address exists, and if not add it plus the port. So first we write the actual adding of an address and the initial discovered port:

static void add_list(char *ipaddr, uint16_t port)
{
    int x;
    struct ipaddr_list *new_item_ptr;

    new_item_ptr = malloc(sizeof(struct ipaddr_list));
    strcpy((*new_item_ptr).host_ip_address, ipaddr);
    for (x = 0; x < PORTMAX; x++) {
        new_item_ptr->portdata.portlist[x] = 0;
        new_item_ptr->portdata.tcpconns[x] = 0;
        new_item_ptr->portdata.udpconns[x] = 0;
    }

    (*new_item_ptr).portdata.portlist[0] = port;
    (*new_item_ptr).portdata.verified = 0; /* set verified to no initially */
    (*new_item_ptr).next_ptr = first_ptr;
    first_ptr = new_item_ptr;
}

Anyone who has dealt with linked lists should immediately see that is what the program really is doing. Following is a play by play of the add_list function:

  1. Allocate a new empty structure.
  2. Copy the address in
  3. Initialize the portlist, tcp connections and udp connections
  4. Set the first portlist item to the discovered port
  5. Initialize the port verified as not yet verified.
  6. add the structure to the end of the list by shuffling list pointers

Now that the initial function for adding an address and port is done adding new ports needs to get knocked out:

void addport(char *ipaddr, uint16_t port, char *proto_name)
{
    int x;
    struct ipaddr_list *current_ptr;

    if (strstr(ipaddr, "255") != NULL) return; // do not do broadcasts   
    if (strstr(ipaddr, "0.0.0.0") != NULL) return; // do not do unknowns
    current_ptr = first_ptr;

    if (current_ptr == NULL) {
        add_list(ipaddr, port);
        return;
    }

    while (current_ptr != NULL) {
        if (strcmp(current_ptr->host_ip_address, ipaddr) == 0) {
            if ((verify_port) && 
			    (current_ptr->portdata.verified == 0))
                if (!connect4(proto_name, ipaddr, port));
                    current_ptr->portdata.verified = 1;

            for (x = 0; x < PORTMAX; x++) {
                if (current_ptr->portdata.portlist[x] == 0)
                    current_ptr->portdata.portlist[x] =
                        port;

                if (current_ptr->portdata.portlist[x] == port) {
                    if (strncmp(proto_name, "tcp", 3))
                        current_ptr->portdata.
                            tcpconns[x] =
                            (current_ptr->portdata.
                             tcpconns[x] + 1);

                    if (strncmp(proto_name, "udp", 3))
                        current_ptr->portdata.
                            udpconns[x] =
                            (current_ptr->portdata.
                             udpconns[x] + 1);

                    return;
                }
            }
        }

        current_ptr = current_ptr->next_ptr;
    }

    if (current_ptr == NULL) add_list(ipaddr, port);
}

Well that looks like a lot but what is going on is pretty easy to unwind. The while() loop is the key. The program searches for the address, if it finds the address the port is added along with the type. Also if the port is to be verified and has not yet been, go ahead and check it by calling the function connect4(). Lastly when adding ports make sure there are no duplicates. This leaves the program with one function left relative to the core data structure, printing it. Recall that data is printed only if it has been verified or if verification has been opted out by the user then if a particular threshold has been crossed in the number of times the port has communicated:

static void print_hosts(void)
{
    int x, printip;
    struct ipaddr_list *current_ptr;

    current_ptr = first_ptr;

    while (current_ptr != NULL) {
        printip = 0;
        x = 0;

        while (current_ptr->portdata.portlist[x] != 0) {
            if (verify_port) {
                if(current_ptr->portdata.verified == 0) {
                    x++;
                    continue;
                }
            }

            if ((current_ptr->portdata.tcpconns[x] >=
                 port_threshold)
                || (current_ptr->portdata.udpconns[x] >=
                port_threshold)) {
                if (!printip)
                    printf("%s:",
                           current_ptr->host_ip_address);
                ++printip;

                printf(" %u ",
                       current_ptr->portdata.portlist[x]);
                if (current_ptr->portdata.tcpconns[x] >=
                    port_threshold)
                    printf("tcp");
                if (current_ptr->portdata.udpconns[x] >=
                    port_threshold) {
                    if (current_ptr->portdata.tcpconns[x] >=
                        port_threshold)
                        printf("/");

                    printf("udp ");
                }
            }
            x++;
        }

        printf("\n");
        current_ptr = current_ptr->next_ptr;
    }
}

In reality what the code above does is if verification is required then skip the port and move on, otherwise check to see if it passed the threshold (either tcp and/or udp connections) and print out the information.

One Final Thought

One final observation about the passive scan program is that it follows alot of programming axioms and pragmas quite nicely. Essentially it does three major operations:

  • Read in packets
  • Add data about the packets to data structures
  • When finished: print out collected data.