Kernel Syncronization

Emanuele Altieri (ealtieri@cs.smith.edu)
Prof. Nicholas Howe (nhowe@cs.smith.edu)
Smith College, June 2002

Spin locks
Semaphores
Wait queues

Questions: Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9.

Spin locks

The simples form of synchronization in the kernel are spin-locks. Spin-locks cause the kernel to spin indefinitely in an busy loop until some kind of resource (represented by the lock) becomes available. One of the functions that implements this behaviour is spin_lock(), defined in include/asm/spinlock.h, line 130:

static inline void spin_lock(spinlock_t *lock)

The function generates the following (simplified) assembly code:

1:  lock; decb my_lock
    jns 3f
2:  cmpb $0, my_lock
    jle 2b
    jmp 1b
3:

The execution of the current kernel control path depends on the my_lock variable. If my_lock is greater than zero at the time spin_lock() is called, the kernel will continue its normal execution. However, if my_lock is less or equal to zero, the kernel will wait in a busy loop until the lock becomes positive again.

The labels 1, 2 and 3 represent different phases of the function. In part 1, my_lock is decremented atomically. The lock instruction before decb assures that on a multi-processor system only one processor at the time can access the memory location referred by my_lock. The jns (Jump If Not Sign) instruction immediately next will then transfer code execution to part 3 if the new value of my_lock is greater or equal to zero, allowing the kernel to continue its current path. On the other hand, if the new value of my_lock is negative, the kernel will keep spinning in part 2 until the lock becomes positive. At this point execution is transferred to part 1 once again.

The function to release the lock, spin_unlock(), is straightforward:

xchgb unlocked_value, my_lock

The code above sets the value of my_lock back to the unlocked state, 1. unlocked_value is just a char variable containing the value 1. xchg executes an atomic exchange of values between the unlocked_value and my_lock variables. The lock instruction is not needed because it is buil-in in xchg.

Several variants of spin_lock() and spin_unlock() exist. For instance, a more sophisticated spinlock is the read/write spin lock, which allows several readers to enter simultaneously a critical region of code. However, only one writer at the time is allowed in the critical section. The functions that implement this kind of spinlock are read_lock(), write_lock(), and their unlock partners.

On machines with multiple processors, spin-locks are a cheap and convenient way to protect important data structures from simultaneous access by two or more processors. However, on computers with a single CPU, spin-locks are generally forbidden and other techniques are used insted, such as semaphores and wait queues (described later).

	Q1. What is the great disadvantage of using spin-locks?
	Q2. Suppose the assembly code for the `spin_lock()` function is changed as shown below. Will this new version still work? Explain. 1: lock; decb my_lock *cmpb $0, my_lock* jge 3f** 2: cmpb $0, my_lock jle 2b jmp 1b 3:
	Q3. What happens if a lock is initialized to 2 instead of 1?

Semaphores

Semaphores are a smarter form of synchronization than spin-locks. Instead of wasting CPU time in busy loops, semaphore functions can put processes to sleep until a given resource becomes available. While a process is sleeping, the CPU can execute other tasks, therefore making the best use of its time and increasing the system's global level of multiprogramming.

Semaphores are defined in include/asm/semaphore.h:

struct semaphore {
	atomic_t count;
	int sleepers;
	wait_queue_head_t wait;
};

The count field indicates the number of simultaneous process allowed in a critical region. A value of 1 allows only one process in the critical section at any time.

The functions that operate on semaphore are the following:

down(sem) and down_interruptible(sem). If the semaphore sem is locked, the functions put the current process to sleep until the semaphore is unlocked by the up() function. A process put to sleep using interruptible_down() can also be woken up by signals (see Processes and Process Scheduling: Sending signals to processes). If the semaphore is unlocked, the process continues its normal execution.
up(sem), releases (unlocks) a semaphore and wakes up one process in the semaphore's waiting queue.

down_interruptible() is almost always preferred to down(). The latter should be used with care and only in some special cases.

One important feature of semaphore functions is that only one process is woken up when a semaphore is released. This differs from sleeping queues (described later), which awake all of the processes sleeping on a given queue.

Device drivers make use of semaphores and waiting queues all the time. Suppose a certain process issues a read from a device, such as a DVD drive. The drive's motor will take a few seconds to reach the optimal spin velocity and seek to the right position on disk. Meanwhile, the process that issued the read has to wait. The device driver in charge of the DVD player could implement this waiting as a busy loop, testing the device for a ready status at every loop iteration. However, this kind of waiting would waste a terrible amount of CPU time. A better solution is to put the process to sleep using semaphores or sleeping queues, until the DVD drive notifies (via interrupts) the processor that it is ready to accept commands.

The following example shows the design of a simple device driver that uses semaphores. The driver creates the device /dev/mutex. The first process reading from this device will acquire a lock that will cause other read attempts from any process to sleep. The lock is released when the process that owns it closes the device.

struct semaphore sem;

/* Module Initialization */
static int __init dev_init(void)
{
        ...

	dev_handle = devfs_register
		(
		 ...
		 "mutex",  /* create /dev/mutex */
		 ...
		 S_IFCHR | S_IRWXO | S_IRWXG | S_IRWXU,  /* permissions, everybody RWX */
		 ...
		 );
        ...

        /* initialize semaphore */
        sema_init(&sem, 1);   /* one process per critical section */

	return(0);
} /* dev_init() */


/* Read from device */
static ssize_t dev_read(struct file *filp, char *buf, size_t count, loff_t *offp)
{
        if (down_interruptible(&sem))
	        return(-EINTR);      /* sleeping has been interrupted by a signal */
	return(count);
} /* dev_read() */


/* Close the device */
static int dev_release(struct inode *inode, struct file *filp)
{
        up(&sem);
	return(0);
}

Notice that the sempahore sem needs to be initialized before it is used. Initialization of a semaphore structure is done by the sema_init(sem, count) function. Also notice the if statement that includes the down_interruptible() function. Recall that a process put to sleep with this function can be woken up by a signal. If this is the case, down_interruptible() returns -EINTR. We need to test for this condition every time we call down_interruptible() and return -EINTR if appropriate. When a process is woken up by the up() function, down_interruptible() returns 0.

	Q4. According to the code fragment shown above, what happens if the the first process to open the `/dev/mutex` device executes the following instructions: fd = open("/dev/mutex", O_RDONLY); read(fd, &buf, count); /* do something... / read(fd, &buf, count); / do something... */ close(fd);
	Q5. Implement the `/dev/mutex` driver using the code fragments introduced above. Show that the semaphore actually works, by running several processes that attempt to take the semaphore, delay a few seconds, and release it. Use `ps` to verify that the waiting processes are blocked rather than spinning.

Wait queues

A wait queue is a linked list of sleeping processes waiting for a certain condition to become true. When the condition is satisfied, the processes on the queue are woken up all at once. This differs from the behaviour of semaphores, in which only one process is woken up when a lock is released. Semaphores are implemented using a special type of wait queue called exclusive wait queue.

A wait queue is identified by a wait_queue_head_t element, defined in /include/linux/wait.h, line 77:

struct __wait_queue_head {
	wq_lock_t lock;
	struct list_head task_list;
};
typedef struct __wait_queue_head wait_queue_head_t;

The task_list field points to a list of tasks waiting on the queue. lock is used to carry out atomic operations on the task list.

Each element in the wait queue has type wait_queue_t:

struct __wait_queue {
	unsigned int flags;
	struct task_struct * task;
	struct list_head task_list;
};
typedef struct __wait_queue wait_queue_t;

The functions that operate on wait queues are the following:

sleep_on(queue) and interruptible_sleep_on(queue), put the current process to sleep in a wait queue. As with semaphores, a process put to sleep with the interruptible function can be woken up by signals. The implementation of sleep_on() is shown below:

void sleep_on(wait_queue_head_t *queue)
{
	wait_queue_t wait;  /* entry in the sleeping queue */

        init_waitqueue_entry(&wait, current);

        wq_write_lock_irqsave(&queue->lock, flags);
        __add_wait_queue(queue, &wait)
	wq_write_unlock(&queue->lock);

        current->state = TASK_UNINTERRUPTIBLE;
        schedule();

	wq_write_lock_irq(&queue->lock);
        __remove_wait_queue(queue, &wait);
	wq_write_unlock_irqrestore(&queue->lock, flags);
}

sleep_on_interruptible() is identical to sleep_on() except for the TASK_INTERRUPTIBLE symbol instead than TASK_UNINTERRUPTIBLE.

The function works by inserting a entry (wait) in the wait queue. The entry is initialized with the init_waitqueue_entry() function and then added to the queue using __add_wait_queue(). Next, the state of the current taks is changed to TASK_UNINTERRUPTIBLE (for sleep_on) or TASK_INTERRUPTIBLE (for sleep_on_interruptible). Finally, the scheduler function is called, which runs another process in the runqueue.

To wake up the processes in a specific wating queue, the kernel provides the following functions: wake_up(), to wake up all of the processes in a wait queue, and wake_up_interruptible(), to wake up only the interruptible processes.

A simplified version of wake_up() is shown below:

void wake_up(wait_queue_head_t *queue)
{
	struct list_head *tmp;
	struct task_struct *p;

	list_for_each(tmp, &queue->task_list) {
		unsigned int state;
                wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);  /* current entry */

		tsk = curr->task;
		if (tsk->state & (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)) {
			tsk->state = TASK_RUNNING;  /* wake up task */
		        if (task_on_runqueue(tsk))
			        continue;           /* task already on the runqueue */
			add_to_runqueue(tsk);
			reschedule_idle(tsk);
		}
	}	
}

The function wakes up every process in the wait queue. The process being woken up is added to the runqueue and its state set to TASK_RUNNING. reschedule_idle() then tries to run the process immediately on a CPU that is currently idle. If no CPU is idle, the function attempts to preempt a currently running process that has lower priority.

Q6. What small change would you make to wake_up() in order to implement wake_up_interruptible()?

In the code below we show a fragment of a device driver that uses wait queues. The driver has an internal buffer of BUFMAX bytes of maximum capacity. The current number of bytes stored in the buffer is given by buffer_size. A process reads from the driver's buffer by issuing a read() file operation. However, if the buffer is empty (buffer_size==0), the process goes to sleep until another process writes to the buffer.

static wait_queue_head_t wq;    /* the readers wait queue */

static int __init dev_init(void)
{
        ...
        init_waitqueue_head(&wq);  /* initialize wait queue */
        ...
}

static ssize_t dev_read(struct file *filp, char *buf, size_t count, loff_t *offp)
{
        if (buffer_size == 0) {
                interruptible_sleep_on(&wq);  /* go to sleep, wait for writers */
		if (signal_pending(current))  /* woken up by a signal? */
			return(-EINTR);
	}
	/* send message to reader */
	count = (count>buffer_size) ? buffer_size : count;
	copy_to_user(buf, dev_buffer, count);
	return(count);
} /* dev_read() */

static ssize_t dev_write(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
	/* store message in device buffer */
	count = (count>BUFMAX) ? BUFMAX : count;
	copy_from_user(dev_buffer, buf, count);
	buffer_size = count;
        wake_up_interruptible(&wq);  /* wake up readers */
	return(count);
}

Two things need to be noticed from the code above:

We always need to initialize the wait queue before we can start using it.
Differently from down_interruptible(), the function interruptible_sleep_on() does not return a particular error code when the process is woken up by a signal. Therefore, we need to manually check wether a process was woken up by a signal or by one of the wake-up functions.

	Q7. Implement the driver described in this section using the code fragment shown above.
	Q8. One problem with the code above is that a writer could change the buffer contents and size while one or more processes and reading from the buffer. Modify your code from the previous question so that it prevents writing to the buffer while a process is reading from it. Use the semaphore functions `down_interruptible()` and `up()` as explained in the Semaphores section.
	Q9. Implement a device driver that operates as following. The driver creates a device `/dev/mbox`. Only one process can read from this device, but many processes can write to it. The device allows messages to be transferred from the writers to the reader. The reader should sleep when no messages are available. The writers should sleep only if a message from a previous writer is still pending for retrieval by the reader.

Kernel Syncronization

Contents

Spin locks

Semaphores

Wait queues