Feb 2007
nullfs: I
The nullfs filesystem is a passthrough filesystem. When
nullfs is mounted it - literally copies the
filesystem transaction. If a file is deleted, nullfs simply transmits the
information down to the lower filesystem. Conversely, if a file is created,
it does the same and tacks on all of the data needed for the filesystem
underneath. Why is that a good thing? Where did nullfs come from
and why?. What else, if anything, is it good for?
The series focuses on where nullfs comes from, how it can be
leveraged, a code walk and a bare implementation (nearly a blind copy).
nullfsThe answer to this is simple as quoted from KirkMcKusick:
The null filesystem was done in July 1992 by John Heideman when he was visiting Berkeley to add his stackable filesystem implementation to BSD. John is the person that built the framework and built nullfs to show others how to use it. Jan-Simon Pendry used that framework in February 1994 to build several new filesystem modules including the union filesystem, the kernel filesystem, the umap filesystem, and the portal filesystem.
Stackable filesystems can lay on top each other (as the name implies) but more importantly - they abstract the details of a regular filesystem.
A good example is to look at layers. While there are many more layers an abstraction of file layers might be:
device-driver kernel
fs
vfs
user-interface (shell)
The ten thousand foot view is simplistic, however, look where the
null layer fits in:
device-driver kernel
fs
vfs
nullfs-portion
user-interface (shell)
Other stackable filesystem layers exist that can do the same such as the union filesystem:
device-driver kernel
fs
vfs
unionfs-portion
user-interface (shell)
Chapter 6 pp.231 of The Design and Implementation of the 4.4BSD
Operating System (1996 McKusick, Bostic, Karels, Quarterman) states
the stackable filesystems succinctly: ... one approach is to stack
filesystems on top of one another other [Rosenthal, 1990].
In short, use
an abstracted method that can accommodate several different filesystem
types to communicate via common API(s). The actual mechanism is the
vnode layer discussed at length later.
Because the null layer does little it is ideal as a starting point for filesystem layer design. For example, the null layer passes object data (often pointers) up and down the layer, if a programmer wished to design a layer that accepted a certain credential, the work of the credential could be put between the pass through layer.
The null filesystem code that is used as an example is found in the NetBSD kernel. [1]
In NetBSD-4 the source files can be found in
~src/sys/miscfs/nullfs[2]. The files are:
Makefilefiles.nullfsnull.hnull_vfsops.cnull_vnops.c
The makefile and files
file are pretty simple:
MakefileINCSDIR= /usr/include/miscfs/nullfs INCS= null.h .include <bsd.kinc.mk>
A stock looking makefile pointing to the include directory and header.
files.nullfsdeffs NULLFS file miscfs/nullfs/null_vfsops.c nullfs file miscfs/nullfs/null_vnops.c nullfs
The files.. file, tells make which c files to use and what they are for (in this case - nullfs).
In the header file can be found information about key data structures
needed for a filesystem implementation. Following is the complete source
to the null.h header file:
#include <miscfs/genfs/layer.h>
struct null_args {
struct layer_args la; /* generic layerfs args */
};
#define nulla_target la.target
#define nulla_export la.export
#ifdef _KERNEL
struct null_mount {
struct layer_mount lm; /* generic layerfs mount stuff */
};
#define nullm_vfs lm.layerm_vfs
#define nullm_rootvp lm.layerm_rootvp
#define nullm_export lm.layerm_export
#define nullm_flags lm.layerm_flags
#define nullm_size lm.layerm_size
#define nullm_tag lm.layerm_tag
#define nullm_bypass lm.layerm_bypass
#define nullm_alloc lm.layerm_alloc
#define nullm_vnodeop_p lm.layerm_vnodeop_p
#define nullm_node_hashtbl lm.layerm_node_hashtbl
#define nullm_node_hash lm.layerm_node_hash
#define nullm_hashlock lm.layerm_hashlock
struct null_node {
struct layer_node ln;
};
#define null_hash ln.layer_hash
#define null_lowervp ln.layer_lowervp
#define null_vnode ln.layer_vnode
#define null_flags ln.layer_flags
int null_node_create(struct mount *, struct vnode *,
struct vnode **);
#define MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data))
#define VTONULL(vp) ((struct null_node *)(vp)->v_data)
#define NULLTOV(xp) ((xp)->null_vnode)
#ifdef NULLFS_DIAGNOSTIC
struct vnode *layer_checkvp(struct vnode *, char *, int);
#define NULLVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
#else
#define NULLVPTOLOWERVP(vp) (VTONULL(vp)->null_lowervp)
#endif
Digesting the header file at once might be daunting. First, the top of the
file includes genfs bits needed. In the genfs layer
header are structures, functions and macros for
generic filesystems:
~src/sys/miscfs/genfs/layer.h
#ifndef _MISCFS_GENFS_LAYER_H_
#define _MISCFS_GENFS_LAYER_H_
struct layer_args {
char *target; /* Target of loopback */
struct export_args30 _pad1; /* compat with old userland tools */
};
#ifdef _KERNEL
struct layer_node;
LIST_HEAD(layer_node_hashhead, layer_node);
struct layer_mount {
struct mount *layerm_vfs;
struct vnode *layerm_rootvp; /* Ref to root layer_node */
u_int layerm_flags; /* mount point layer flags */
u_int layerm_size; /* size of fs's struct node */
enum vtype layerm_tag; /* vtag of our vnodes */
int /* bypass routine for this mount */
(*layerm_bypass)(void *);
int (*layerm_alloc) /* alloc a new layer node */
(struct mount *, struct vnode *,
struct vnode **);
int (**layerm_vnodeop_p) /* ops for our nodes */
(void *);
struct layer_node_hashhead /* head of hash list for layer_nodes */
*layerm_node_hashtbl;
u_long layerm_node_hash; /* hash mask for hash chain */
struct simplelock layerm_hashlock; /* interlock for hash chain. */
};
#define LAYERFS_MFLAGS 0x00000fff /* reserved layer mount flags */
#define LAYERFS_MBYPASSDEBUG 0x00000001
struct layer_node {
LIST_ENTRY(layer_node) layer_hash; /* Hash list */
struct vnode *layer_lowervp; /* VREFed once */
struct vnode *layer_vnode; /* Back pointer */
unsigned int layer_flags; /* locking, etc. */
};
#define LAYERFS_RESFLAGS 0x00000fff /* flags reserved for layerfs */
#define LAYERFS_REMOVED 0x00000001 /* Did a remove on this node */
#define LAYERFS_UPPERLOCK(v, f, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockmgr(&(v)->v_lock, (f), &(v)->v_interlock); \
else \
r = 0; \
} while (0)
#define LAYERFS_UPPERUNLOCK(v, f, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockmgr(&(v)->v_lock, (f) | LK_RELEASE, &(v)->v_interlock); \
else \
r = 0; \
} while (0)
#define LAYERFS_UPPERISLOCKED(v, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockstatus(&(v)->v_lock); \
else \
r = -1; \
} while (0)
#define LAYERFS_DO_BYPASS(vp, ap) \
(*MOUNTTOLAYERMOUNT((vp)->v_mount)->layerm_bypass)((ap))
struct vnode *layer_checkvp(struct vnode *vp, const char *fil, int lno);
#define MOUNTTOLAYERMOUNT(mp) ((struct layer_mount *)((mp)->mnt_data))
#define VTOLAYER(vp) ((struct layer_node *)(vp)->v_data)
#define LAYERTOV(xp) ((xp)->layer_vnode)
#ifdef LAYERFS_DIAGNOSTIC
#define LAYERVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
extern int layerfs_debug;
#else
#define LAYERVPTOLOWERVP(vp) (VTOLAYER(vp)->layer_lowervp)
#endif
#endif /* _KERNEL */
#endif /* _MISCFS_GENFS_LAYER_H_ */
Of import are the layer_args, layer_mount and
layer_node structures, the null.h header
accounts for and defines them for the null layer context. For example, the
layer_mount data:
struct mount *layerm_vfs;
Is redefined with:
#define nullm_vfs lm.layerm_vfs
Note that a whole new data structure, null_mount
was instantiated from the genericfilesystem:
struct null_mount {
struct layer_mount lm; /* generic layerfs mount stuff */
};
Essentially, using the generic filesystem bits one can construct a filesystem layer with general ease.
Stackable filesystems enable system programmers the capability to rapidly
prototype, design and in some cases deploy new filesystems and/or new
filesystem layers. The null filesystem is a great template
for getting started on filesystem design. In the next part(s) of the series
a codewalk of key functions within the actual null layer
code, hooking into the kernel and an example implementation.
(based on last 2 months log reports)