How to write a kernel module for Linux

Back to main page

Emanuele Altieri (ealtieri@cs.smith.edu)
Prof. Nicholas Howe (nhowe@cs.smith.edu)
Smith College, June 2002

Contents

  1. What is a kernel module?
  2. Classes of devices and modules
  3. The file interface
  4. Compiling and loading a module
  5. Communicating with a module
  6. Testing the module
  7. Debugging using printk()
  8. The /proc file system
  9. Questions
  10. Resources

What is a kernel module?

A kernel module is an extension to the operating system. The module resides in the same privilege level of the OS (the highest) and therefore can access every resource of the system. Under Linux, a module is nothing more than a C program with a well-defined interface to communicate with user processes and with other parts of the operating system.

In the following sections we will use the term device driver instead of kernel module. A device driver is a kernel module specialized in I/O communication with some sort of device. The term device has a very wide meaning and it does not exclusively refer to an external or physical system. In general, a device is some kind of resource such as a floppy disk, a printer, a mouse, but also a special region of memory, a virtual terminal or a message box.

An excellent source of information on this topic is Linux Device Drivers, by Alessandro Rubini and Jonathan Corbet. This book, published by O'Reilly, is also available online in PDF and HTML formats (see resources).

Writing device drivers for Linux is very easy. By the end of this paper you should be capable of writing a simple device driver. At that point, if you found all of this fascinating, you should definitely read the book above - in any case, take a look at it!

Classes of devices and modules

Linux distinguishes between three types of devices. Each module implements only one of these types and thus is classifiable as a character module, block module or a network module.

It is possible to identify a module's class by using the ls -l command. For example:

[ealtieri@italia os]$ ls -l /dev/tty
crw-rw-rw- 1 root root 5, 0 Jun 15 12:59 /dev/tty

The "c" in the file properties shows that the tty (terminal) device is a character device.

In this document we will discuss character devices only.

The File Interface

One of the greatest features of Linux and UNIX is that almost every resource on the system looks like a file, including devices. As shown in the previous section, device files (called nodes) are located under the /dev directory. Each of these device files is associated with a particular module in the kernel. If the kernel is compiled with Device File System support, the module creates /dev entries automatically at load-time and removes them when it is unloaded.

Kernel Modules

Because devices are files, we can issue file operations on them such as open(), read(), write() and close(). Every time a file operation is issued on a device file, the kernel module associated with such device must handle that operation. For example:

fd = open("/dev/hda", O_RDONLY);

The above operation opens the /dev/hda device (first hard disk) for read only (O_RDONLY). When open() is issued, the operating system knows that /dev/hda is a device file. Therefore it locates the kernel module associated with the device and calls the device_open() file operation handler in that module. At this point it is up to the device driver to initialize the device and maybe return an error code.

There must be a handler for every possible file operation (listed below). However, the device driver can choose default actions for some operations.

How does the OS know which module is associated with a /dev entry? Each module has to register itself using the devfs_register() function. This function automatically creates an entry in the /dev directory. The module also uses this function to tell the operating system the address of the file-operations handler functions, as shown below. Thus, the call to open() above can be translated to the appropriate device-specific function provided by devfs_register().

/* handlers for the file operations */
struct file_operations mydev_fops = {
  open   : mydev_open,            /* handler for the open() operation    */
  release: mydev_close,           /* handler for the close() operation   */
    /* NULL (default actions) */
};

/* this function is called when the module is loaded */
int mydev_init(void)
{
  ...
  devid = devfs_register( ...  
                          "mydev",     /* create /dev/mydev entry */
                          ...
                          &mydev_fops, /* file ops handlers (see above) */
                          ... );

There are several file operations that a module can implement. These are defined in the file_operations structure, in include/linux/fs.h (line 817). For convenience, this structure has been reported below. For a more detailed description of these operations see Linux Device Drivers, page 64. In this document we will consider only the simplest file operations: read() and write().

struct file_operations {
        struct module *owner;
	loff_t (*llseek) (struct file *, loff_t, int);
	ssize_t (*read) (struct file *, char *, size_t, loff_t *);
	ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
	int (*readdir) (struct file *, void *, filldir_t);
	unsigned int (*poll) (struct file *, struct poll_table_struct *);
	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*fasync) (int, struct file *, int);
	int (*lock) (struct file *, int, struct file_lock *);
	ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
	unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
};

Using the above file operations a program can transfer data to and from a device driver, like in the example below:

/* PROGRAM a.out - reads the first sector of a floppy disk */

int main(int argc, char *argv[]) 
{
        int fd;
        char sector[512];
	ssize_t count;

        fd = open("/dev/fd0", O_RDONLY);    /* open floppy device */

        count = read(fd, sector, 512);      /* read one sector */

        close(fd);                          /* close device */

        /* do something else... */

        return(0);
}

If at load time the driver has to register itself, it also needs to unregister when unloaded, as shown below:

void mydev_exit(void)
{
  devfs_unregister(devid);
  ...

dev_unregister() unregisters the device and automatically removes the /dev entry created at load time.

So far we have assumed that the mydev_init() and mydev_exit() are called at load time and unload time respectively. We actually need to tell this information explicitly to the operating system using the following macros:

module_init(mydev_init);
module_exit(mydev_exit);

These macros are placed generally at the end of the file.

skeldev.c skeldev.c is a basic implementation of a device driver. It registers itself and creates a /dev/skeldev.c entry. The driver does not handle any file operation - the default OS actions apply.

Compiling and loading a module

To compile a kernel module just type the following:

[ealtieri@italia dev]$ gcc -c -D __KERNEL__ -D MODULE skeldev.c

Notice the -c flag which tells GCC to compile the source file without generating an executable. The command above produces the skeldev.o object file. This object can now be injected in the kernel using the (privileged) insmod command:

[ealtieri@italia dev]$ sudo /sbin/insmod skeldev.o

If insmod does not output any error message, the module has been correctly inserted in the kernel. You can see this with the "List Modules" command (lsmod). Also, the module should have created the /dev/skeldev entry.

[ealtieri@italia dev]$ /sbin/lsmod 
Module                  Size  Used by    Tainted: P  
skeldev                  644   0 (unused)

[ealtieri@italia dev]$ ls -l /dev/skeldev 
crw-rw-rw-    1 root     root       8,   2 Dec 31  1969 /dev/skeldev

Communicating with a module

Communication between a device driver and a user process occurs mainly with the read() and write() file operations. Using the basic device driver skeleton above as reference, we can add handlers for the read() and write() file operations.

/* read from device */
static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp);

/* write to device */
static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp);

/* file operations handlers */
static struct file_operations skel_fops = {
        read  : skel_read,    /* handler for the read() operation  */
        write : skel_write,   /* handler for the write() operation */
        /* NULL (default) */
}

The read() file operation takes four arguments:

The function returns the number of bytes copied to the user buffer, which may or may not be equal to count.

Following is a simple implementation of skell_read() which copies data from the driver buffer skel_buffer[] to the user buffer:

/* device buffer */
static unsigned char skel_buffer[SKEL_BUFMAX];

...

/* read from device */
static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;  /* trim data */
	copy_to_user(buf, skel_buffer, count);
	return(count);
}

The write() file operation is similar to read(), but in this case data flows from the user buffer to the driver buffer:

/* write to device */
static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;
	copy_from_user(skel_buffer, buf, count);
	return(count);
}
skeldev2.c skeldev2.c is a basic device driver implementation with read() and write() file operations.

Testing the module

To test the skeldev2.c device driver we can write a simple C program that opens the skeldev2 device, writes some data and then retreives it. This is shown below.

int main(void) 
{
	int fd;
	ssize_t count;
	char buf[50];

	/* open device */
	if ((fd = open("/dev/skeldev2", O_RDWR)) < 0) {
		perror("open()");
		exit(1);
	}

	/* write to device */
	memset(buf, 0x00, sizeof(buf));   /* clear buffer */
	strcpy(buf, "Hello World!");
	count = write(fd, buf, sizeof(buf));
	printf("Written %d bytes to device\n", count);

	/* read from device */
	memset(buf, 0x00, sizeof(buf));   /* clear buffer */
	count = read(fd, buf, sizeof(buf));
	printf("Read %d bytes from device: %s\n", count, buf);

	/* close device */
	close(fd);
	exit(0);
}
test_skel2.c test_skel2.c is a test program for the skeldev2.c device.

Debugging with printk()

Because a kernel module does not run in user space, the C libraries are useless. This means that the familiar printf() function will not work in a kernel module. Fortunately, the kernel provides a similar function, printk(), which your device driver can use to output messages. However, there are some important differences between these two functions:

The /proc filesystem

A different way to communicate with device drivers is the /proc file system. For example, let's examine the /proc/meminfo file:

[ealtieri@italia os]$ ls -l /proc/meminfo 
-r--r--r--    1 root     root            0 Jun 19 11:20 /proc/meminfo
[ealtieri@italia os]$ cat /proc/meminfo 
        total:    used:    free:  shared: buffers:  cached:
Mem:  525320192 447213568 78106624        0 81035264 157761536
Swap: 271392768        0 271392768
MemTotal:       513008 kB
MemFree:         76276 kB
MemShared:           0 kB
Buffers:         79136 kB
...

As you can see from the ls command above, the /proc/meminfo file has size zero (number left to the date). However, when we show the contents of the file with cat, the file appears to contain information. How can we explain this? The trick is that files under the /proc file system are generated when they are read. Each of these files is associated with a module in the kernel. When the file is read, using cat for example, the kernel locates its module and calls a function to generate the contents of the file.

/proc operations are handled in a similar way to the file operations described earlier. You can define a handler for a read operation on a /proc entry and one for a write operation. In this document we will describe only the read handler.

First, a proc entry is created at load time using the create_proc_read_entry() function:

/* create /proc/xxxx entry */
proc = create_proc_read_entry
	(
	 skel_name,              /* entry name (/proc/skeldev)    */
	 0,                      /* default mode                  */
	 NULL,                   /* parent directory (NULL=/proc) */
	 skel_read_proc,         /* read() operation handler      */
	 NULL                    /* other data                    */
	 );

The code above registers skel_read_proc() as the handler for the read operation on the /proc/skeldev entry. This function must have the following prototype:

static int skel_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void *data)

The contents of the /proc/skeldev file is generated "on the fly" by writing to the buf parameter. A simple implementation of skel_read_proc() could be the following:

/* Handler for /proc/skeldev read */
static int skel_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void *data)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;
	memcpy(buf, skel_buffer, count);   /* generate file */
	*eof = 1;                          /* end of file */
	return(count);
} /* skel_read_proc() */

The functions can be tested as following:

[ealtieri@italia dev]$ sudo /sbin/insmod skeldev3.o 
[ealtieri@italia dev]$ echo "Hello World" > /dev/skeldev 
[ealtieri@italia dev]$ cat /proc/skeldev 
Hello World
[ealtieri@italia dev]$
skeldev3.c skeldev3.c implements the /proc/skeldev entry.

The /proc filesystem is a simple way to communicate system information to user processes and is extensively used under Linux. Examples of /proc entries are /proc/meminfo, which displays information about memory, and /proc/cpuinfo, which displays processor type and features.

Writing handlers for /proc files can become really complicated if the data to be output is big. In this case, several read() calls may be needed to retrieve the whole data. In between successive read() calls, the module or kernel's data structures being read can change, causing inconsistency in the output.

More information about the /proc filesystem can be found on Linux Device Drivers, page 103.

Questions

  1. The /dev/null device discards everything that is being written to it and returns nothing when reading from it. This device is useful if you want to hide the messages output by a program. For example: "gcc example.c > /dev/null" will hide all of the compiler's messages. How would you implement this device driver?
  2. Modify the skeldev2.c device driver so that the read() file operation returns the data written to the buffer in opposite order.
  3. Communication between different running processes is a major topic in Operating Systems. This is called Inter-Process Communcation (IPC). Describe how you would use a device driver for inter-process communication.

Resources

Linux Device Drivers Linux Device Drivers, Second Edition, by Alessandro Rubini and Jonathan Corbet. Original version published by O'Reilly & Associates. Available online in PDF and HTML format.
skeldev.c skeldev.c is a basic implementation of a device driver. It registers itself and creates a /dev/skeldev.c entry. The driver does not handle any file operation - the default OS actions apply.
skeldev2.c skeldev2.c is a basic device driver implementation with read() and write() file operations.
test_skel2.c test_skel2.c is a test program for the skeldev2.c device.
skeldev3.c skeldev3.c implements the /proc/skeldev entry.

Valid
   XHTML 1.0!   Powered by RedHat Linux