Feb 2007

Using nullfs: I

The nullfs filesystem is a passthrough filesystem. When nullfs is mounted it - literally copies the filesystem transaction. If a file is deleted, nullfs simply transmits the information down to the lower filesystem. Conversely, if a file is created, it does the same and tacks on all of the data needed for the filesystem underneath. Why is that a good thing? Where did nullfs come from and why?. What else, if anything, is it good for? The series focuses on where nullfs comes from, how it can be leveraged, a code walk and a bare implementation (nearly a blind copy).

Origins of nullfs

The answer to this is simple as quoted from KirkMcKusick:

The null filesystem was done in July 1992 by John Heideman when he was visiting Berkeley to add his stackable filesystem implementation to BSD. John is the person that built the framework and built nullfs to show others how to use it. Jan-Simon Pendry used that framework in February 1994 to build several new filesystem modules including the union filesystem, the kernel filesystem, the umap filesystem, and the portal filesystem.

Stackable filesystems can lay on top each other (as the name implies) but more importantly - they abstract the details of a regular filesystem.

A good example is to look at layers. While there are many more layers an abstraction of file layers might be:

  device-driver kernel
       fs           
       vfs           
 user-interface (shell)

The ten thousand foot view is simplistic, however, look where the null layer fits in:

  device-driver kernel
       fs           
       vfs           
      nullfs-portion
 user-interface (shell)

Other stackable filesystem layers exist that can do the same such as the union filesystem:

  device-driver kernel
       fs           
       vfs           
      unionfs-portion
 user-interface (shell)

Chapter 6 pp.231 of The Design and Implementation of the 4.4BSD Operating System (1996 McKusick, Bostic, Karels, Quarterman) states the stackable filesystems succinctly: ... one approach is to stack filesystems on top of one another [Rosenthal, 1990]. In short, use an abstracted method that can accommodate several different filesystem types to communicate via common API(s). The actual mechanism is the vnode layer discussed at length later.

Because the null layer does little it is ideal as a starting point for filesystem layer design. For example, the null layer passes object data (often pointers) up and down the layer, if a programmer wished to design a layer that accepted a certain credential, the work of the credential could be put between the pass through layer.

Pass One: Components of Nullfs

The null filesystem code that is used as an example is found in the NetBSD kernel. [1] In NetBSD-4 the source files can be found in ~src/sys/miscfs/nullfs[2]. The files are:

  • Makefile
  • files.nullfs
  • null.h
  • null_vfsops.c
  • null_vnops.c

The makefile and files file are pretty simple:

Makefile

INCSDIR= /usr/include/miscfs/nullfs
INCS=   null.h
.include <bsd.kinc.mk>

A stock looking makefile pointing to the include directory and header.

files.nullfs

deffs   NULLFS
file    miscfs/nullfs/null_vfsops.c     nullfs
file    miscfs/nullfs/null_vnops.c      nullfs

The files.. file, tells make which c files to use and what they are for (in this case - nullfs).

Data Structures

In the header file can be found information about key data structures needed for a filesystem implementation. Following is the complete source to the null.h header file:

#include <miscfs/genfs/layer.h>
struct null_args {
        struct  layer_args      la;     /* generic layerfs args */
};    
#define nulla_target    la.target
#define nulla_export    la.export
#ifdef _KERNEL
struct null_mount {
        struct  layer_mount     lm;     /* generic layerfs mount stuff */
}; 
#define nullm_vfs               lm.layerm_vfs 
#define nullm_rootvp            lm.layerm_rootvp
#define nullm_export            lm.layerm_export
#define nullm_flags             lm.layerm_flags
#define nullm_size              lm.layerm_size
#define nullm_tag               lm.layerm_tag
#define nullm_bypass            lm.layerm_bypass
#define nullm_alloc             lm.layerm_alloc
#define nullm_vnodeop_p         lm.layerm_vnodeop_p
#define nullm_node_hashtbl      lm.layerm_node_hashtbl
#define nullm_node_hash         lm.layerm_node_hash
#define nullm_hashlock          lm.layerm_hashlock
struct null_node {
        struct  layer_node      ln;
};    
#define null_hash       ln.layer_hash
#define null_lowervp    ln.layer_lowervp
#define null_vnode      ln.layer_vnode
#define null_flags      ln.layer_flags
int     null_node_create(struct mount *, struct vnode *,
            struct vnode **);
#define MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data)) 
#define VTONULL(vp) ((struct null_node *)(vp)->v_data)
#define NULLTOV(xp) ((xp)->null_vnode)
#ifdef NULLFS_DIAGNOSTIC
struct vnode *layer_checkvp(struct vnode *, char *, int);
#define NULLVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
#else
#define NULLVPTOLOWERVP(vp) (VTONULL(vp)->null_lowervp)
#endif

Digesting the header file at once might be daunting. First, the top of the file includes genfs bits needed. In the genfs layer header are structures, functions and macros for generic filesystems:

~src/sys/miscfs/genfs/layer.h
#ifndef _MISCFS_GENFS_LAYER_H_
#define _MISCFS_GENFS_LAYER_H_
struct layer_args {
        char    *target;                /* Target of loopback  */
        struct export_args30 _pad1; /* compat with old userland tools */
};
#ifdef _KERNEL
struct layer_node;
LIST_HEAD(layer_node_hashhead, layer_node);
struct layer_mount {
        struct mount            *layerm_vfs;
        struct vnode            *layerm_rootvp; /* Ref to root layer_node */
        u_int                   layerm_flags;   /* mount point layer flags */
        u_int                   layerm_size;    /* size of fs's struct node */
        enum vtype              layerm_tag;     /* vtag of our vnodes */
        int                             /* bypass routine for this mount */
                                (*layerm_bypass)(void *);
        int                     (*layerm_alloc) /* alloc a new layer node */
                                (struct mount *, struct vnode *,
                                                struct vnode **);
        int                     (**layerm_vnodeop_p)    /* ops for our nodes */
                                (void *);
        struct layer_node_hashhead      /* head of hash list for layer_nodes */
                                *layerm_node_hashtbl;
        u_long                  layerm_node_hash; /* hash mask for hash chain */
        struct simplelock       layerm_hashlock; /* interlock for hash chain. */
};
#define LAYERFS_MFLAGS          0x00000fff      /* reserved layer mount flags */
#define LAYERFS_MBYPASSDEBUG    0x00000001
struct layer_node {
        LIST_ENTRY(layer_node)  layer_hash;     /* Hash list */
        struct vnode            *layer_lowervp; /* VREFed once */
        struct vnode            *layer_vnode;   /* Back pointer */
        unsigned int            layer_flags;    /* locking, etc. */
};
#define LAYERFS_RESFLAGS        0x00000fff      /* flags reserved for layerfs */
#define LAYERFS_REMOVED         0x00000001      /* Did a remove on this node */
#define LAYERFS_UPPERLOCK(v, f, r)      do { \
        if ((v)->v_vnlock == NULL) \
                r = lockmgr(&(v)->v_lock, (f), &(v)->v_interlock); \
        else \
                r = 0; \
        } while (0)
#define LAYERFS_UPPERUNLOCK(v, f, r)    do { \
        if ((v)->v_vnlock == NULL) \
            r = lockmgr(&(v)->v_lock, (f) | LK_RELEASE, &(v)->v_interlock); \
        else \
                r = 0; \
        } while (0)
#define LAYERFS_UPPERISLOCKED(v, r)     do { \
        if ((v)->v_vnlock == NULL) \
                r = lockstatus(&(v)->v_lock); \
        else \
                r = -1; \
        } while (0)
#define LAYERFS_DO_BYPASS(vp, ap)       \
        (*MOUNTTOLAYERMOUNT((vp)->v_mount)->layerm_bypass)((ap))
struct vnode *layer_checkvp(struct vnode *vp, const char *fil, int lno);
#define MOUNTTOLAYERMOUNT(mp) ((struct layer_mount *)((mp)->mnt_data))
#define VTOLAYER(vp) ((struct layer_node *)(vp)->v_data)
#define LAYERTOV(xp) ((xp)->layer_vnode)
#ifdef LAYERFS_DIAGNOSTIC
#define LAYERVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
extern int layerfs_debug;
#else
#define LAYERVPTOLOWERVP(vp) (VTOLAYER(vp)->layer_lowervp)
#endif
#endif /* _KERNEL */
#endif /* _MISCFS_GENFS_LAYER_H_ */

Of import are the layer_args, layer_mount and layer_node structures, the null.h header accounts for and defines them for the null layer context. For example, the layer_mount data:

        struct mount            *layerm_vfs;

Is redefined with:

#define nullm_vfs               lm.layerm_vfs

Note that a whole new data structure, null_mount was instantiated from the genericfilesystem:

struct null_mount {
        struct  layer_mount     lm;     /* generic layerfs mount stuff */
}; 

Essentially, using the generic filesystem bits one can construct a filesystem layer with general ease.

Summary & Next Time

Stackable filesystems enable system programmers the capability to rapidly prototype, design and in some cases deploy new filesystems and/or new filesystem layers. The null filesystem is a great template for getting started on filesystem design. In the next part(s) of the series a codewalk of key functions within the actual null layer code, hooking into the kernel and an example implementation.


Footnotes

  1. The FreeBSD and OpenBSD systems also have a null layer. The implementation of the layer code is similar, hooking into the kernel varies
  2. The pre-formatted text is not exactly the same as the source code files, empty lines, some comments and CVS id tags have been removed for brevity.