Monday, 10 November 2014

Linux Kernel Synchronization Primitives

Linux kernel offers a synchronization technique for every possible need (you think it, they have it !).
It's like a fruit basket - you just ought to know what you want. Picking a wrong fruit can ruin the taste, so be careful !

Symmetrical Multi Processing (SMP) has also contributed greatly in this confusion. This guide tries to cover both, UP (uni processor) and SMP systems.

So let's begin the tour, grab your hot chocolate and sit comfortably to enjoy the ride.

Before we actually start looking into fruits, lets first understand the basket without which, all fruits will fall.

You want to protect critical regions and avoid race conditions in your code.
What is a critical region ? Code paths that access and manipulate shared data are called critical regions (also called critical sections). It is usually unsafe for multiple threads of execution to access the same resource simultaneously.
When this does occur, we call it a race condition, so-named because the threads raced to get there first.

Consider a simple shared resource, a single global integer, and a simple critical region, the operation of merely incrementing it: i++
Now, assume that there are two threads of execution, both enter this critical region, and the initial value of i is 7.The desired outcome is then similar to the following (with each row representing a unit of time):

Ensuring that unsafe concurrency is prevented and that race conditions do not occur is called synchronization.


It is God's will to prevent concurrent access during critical regions, the programmer must act as an angel to ensure that code executes atomically—that is, operations complete without interruption as if the entire critical region were one indivisible instruction. Failing to do so will bring curse upon you.

For those wondering about a newly added word in their vocabulary i.e., concurrency, please stay with me. Other wise men can skip this section and continue further.
"A concurrent system is a collection of interacting computational tasks that may execute in parallel. In user-space, synchronization is needed because programs are scheduled preemptively at the will of the scheduler. Because a process can be preempted at any time and another process can be scheduled onto the processor, a process can be involuntarily preempted in the middle of accessing a critical region. If the newly scheduled process then enters the same critical region (say, if the two processes manipulate the same shared memory or write to the same file descriptor), a race can occur.The same problem can occur with multiple single-threaded processes sharing files, or within a single program with signals, because signals can occur asynchronously.This type of concurrency—in which two things do not actually happen at the same time but interleave with each other such that they might as well—is called pseudo-concurrency. If you have a symmetrical multiprocessing machine, two processes can actually be executed in a critical region at the exact same time.That is called true concurrency."

The kernel has similar causes of concurrency:
  • Interrupts— An interrupt can occur asynchronously at almost any time, interrupting the currently executing code.
  • Softirqs and tasklets— The kernel can raise or schedule a softirq or tasklet at almost any time, interrupting the currently executing code.
  • Kernel preemption— Because the kernel is preemptive, one task in the kernel can preempt another.
  • Sleeping and synchronization with user-space— A task in the kernel can sleep and thus invoke the scheduler, resulting in the running of a new process.
  • Symmetrical multiprocessing— Two or more processors can execute kernel code at exactly the same time.
Having understood synchronization and concurrency (the basket), now we need a way of making sure that only one thread manipulates the shared resource at a time—a mechanism for preventing access to a resource while another thread of execution is in the marked region.

This mechanism is provided by locks (the fruits).

The basic rule to locking is - Protect data, not code !! (never ever forget this)

Code that is safe from concurrent access from an interrupt handler is said to be interrupt-safe. Code that is safe from concurrency on symmetrical multiprocessing machines is SMP-safe. Code that is safe from concurrency with kernel preemption is preempt-safe. The actual mechanisms used to provide synchronization and protect against race conditions in all these cases are: 
  • Atomic Operations: Atomic operations provide instructions that execute atomically—without interruption. Atomic operators are indivisible instructions. For example, an atomic increment can read and increment a variable by one in a single indivisible and uninterruptible step. Using atomic instructions requires understanding memory models and barriers. Beware of atomic instructions and memory barriers for they are subtle and quick to anger.

  • Spin Locks: This is the most common lock in Linux kernel. The idea behind spin lock is to execute a tight loop or a busy wait until the lock is released (become available). So a spin lock is a lock that can be held by at most one thread of execution. 
    • The executing core cannot be used for anything else while spinning.
    • Spin-locks are usable in atomic context (ISR, softirqs, tasklets) as they do not sleep (read last two pointsagain until you remember them like your own name).
    • They provide the needed protection from concurrency on multiprocessing machines. On uniprocessor machines, the locks compile away and do not exist; they simply act as markers to disable and enable kernel preemption. If kernel preempt is turned off, the locks compile away entirely.
    • Spin locks are not recursive
    •  Use spin locks when shared data can be accessed in atomic context to prevent thread/atomic or atomic/atomic race.
    • If a spin lock is shared between a thread and interrupt handler, you must disable local interrupts (interrupt requests on the current processor) before obtaining the lock. Otherwise, it is possible for an interrupt handler to interrupt kernel code while the lock is held and attempt to reacquire the lock. The interrupt handler spins, waiting for the lock to become available.The lock holder, however, does not run until the interrupt handler completes. Note that you need to disable interrupts only on the current processor. If an interrupt occurs on a different processor, and it spins on the same lock, it does not prevent the lock holder (which is on a different processor) from eventually releasing the lock. The kernel provides an interface that conveniently disables interrupts (spin_lock_irqsave/spin_unlock_irqrestore, spin_lock_bh/spin_unlock_bh). If the data is shared between two different tasklets, however, you must obtain a normal spin lock before accessing the data in the bottom half.You do not need to disable bottom halves because a tasklet never preempts another running tasklet on the same processor. With softirqs, regardless of whether it is the same softirq type, if data is shared by softirqs, it must be protected with a lock. Softirqs, even two of the same type, might run simultaneously on multiple processors in the system.A softirq never preempts another softirq running on the same processor, however, so disabling bottom halves is not needed.
    • Do not mask interrupts for protecting data shared between interrupt and non-interrupt context, use spin-locks instead.
  • Reader-Writer Spin Locks: Reader-writer spin locks provide separate reader and writer variants of the lock. One or more readers can concurrently hold the reader lock.The writer lock, conversely, can be held by at most one writer with no concurrent readers. Reader/writer locks are sometimes called shared/exclusive or concurrent/exclusive locks because the lock is available in a shared (for readers) and an exclusive (for writers) form. Beware of r/w locks performance.

  • Semaphores: Semaphores in Linux are sleeping locks.When a task attempts to acquire a semaphore that is unavailable, the semaphore places the task onto a wait queue and puts the task to sleep.The processor is then free to execute other code.When the semaphore becomes available, one of the tasks on the wait queue is awakened so that it can then acquire the semaphore.
    • Because sleeping nature, semaphores are well suited to locks that are held for a long time.
    • They are not optimal for locks that are held for short periods
    • Because a thread of execution sleeps on lock contention, semaphores must be obtained only in process context because interrupt context is not schedulable.
    • You cannot hold a spin lock while you acquire a semaphore, because you might have to sleep while waiting for the semaphore, and you cannot sleep while holding a spin lock.
    • Unlike spin locks, semaphore do not disable kernel preemption. A code holding a semaphore can be preempted.
    • They can allow for at most count number of simultaneous lock holders. In that case, they are called counting semaphores. When count is equal to one, they are called binary semaphores.
    • Although a binary semaphore provide mutual exclusion, but don’t use semaphores for mutual exclusion. That’s a job better handled by mutexes for two reasons. Firstly mutexes are faster (at least kernel-side). Secondly as mutexes have a more precise semantic, the implementation can automagically reports some bugs, for example unlocking a mutex before locking it. The kernel has a configuration option to enable such checks.
    • Primary usage of semaphore should be for synchronization (between threads running in different context), not for locking.
    • Do not use semaphores when better suited alternatives exist (this may sound harsh on semaphores but trust me, you don't want your life getting wasted in fixing semaphore induced bugs).
  •  Reader-Writer Semaphores: The situations where reader-writer semaphores are preferred over standard semaphores are the same as with reader-writer spin locks versus standard spin locks.
  • Mutexes: Sleeping(and sometimes adaptive) lock which allows mutual exclusion. It behaves similar to a semaphore with a count of one, but it has a simpler interface, more efficient performance, and additional constraints on its use.
    • Only one task can hold the mutex at a time.That is, the usage count on a mutex is always one.
    • Whoever locked a mutex must unlock it.That is, you cannot lock a mutex in one context and then unlock it in another.This means that the mutex isn’t suitable for more complicated synchronizations between kernel and user-space.
    • Recursive locks and unlocks are not allowed.
    • A process cannot exit while holding a mutex
    • A mutex cannot be acquired by an interrupt handler or bottom half, even with
      mutex_trylock().
  •  Completion Variables: Using completion variables is an easy way to synchronize between two tasks in the kernel when one task needs to signal to the other that an event has occurred. One task waits on the completion variable while another task performs some work.When the other task has completed the work, it uses the completion variable to wake up any waiting tasks. Do you hear bells ringing that sounds like semaphore ? If yes, (i am glad you are still awake) you are right—the idea is much the same. In fact, completion variables merely provide a simple solution to a problem whose answer is otherwise semaphores.
  • Sequential Locks:  The sequential lock, generally shortened to seq lock provides a simple mechanism for reading and writing shared data. It works by maintaining a sequence counter.Whenever the data in question is written to, a lock is obtained and a sequence number is incremented. Prior to and after reading the data, the sequence number is read. If the values are the same, a write did not begin in the middle of the read. Further, if the values are even, a write is not underway. (Grabbing the write lock makes the value odd, whereas releasing it makes it even because the lock starts at zero.) Seq locks are useful to provide a lightweight and scalable lock for use with many readers and a few writers. Seq locks, however, favor writers over readers. An acquisition of the write lock always succeeds as long as there are no other writers. Readers do not affect the write lock, as is the case with reader-writer spin locks and semaphores. Furthermore, pending writers continually cause the read loop (the previous example) to repeat, until there are no longer any writers holding the lock.
  • Preemption Disabling:  Remember if a spin lock is held, kernel is not preemptive. Disabling
    preemption (preempt_disable/preempt_enable) does not prevent ISRs and tasklets from running however. Moreover, disabling preemption should not be used for avoiding race conditions because:
    • Disabling preemption operates on the current CPU core only and so does not prevent
      races in SMP systems.
    • Disabling preemption has a global impact on thread latency and so can impact realtime.
  • Barriers: Both the compiler and the processor can reorder reads and writes for performance reasons. When dealing with synchronization between multiple processors or with hardware devices, it is sometimes a requirement that memory-reads (loads) and memory-writes (stores) issue in the order specified in your program code. All processors that do reorder reads or writes provide machine instructions to enforce ordering requirements. It is also possible to instruct the compiler not to reorder instructions around a given point.These instructions are called barriers. I'll suggest you to save some juice and read kernel documentation on barriers clearly before you intend to use them. (http://kernel.org/doc/Documentation/memory-barriers.txt)
Don't tell me you are still looking for a fruit to satisfy your need.
If you are, then i strongly suggest to reconsider your requirement. But don't get disheartened, i am pretty sure there is a way out.

Did i forgot to mention not to rely on thread priorities to fix race conditions? If yes, then mark this one too on your nearby stone 

Ironically races are caused by lack of synchronization and deadlocks by excessive or incorrect synchronization. Concurrent programming is therefore a delicate dance between these two traps, and many lesser evils, to deliver the required feature set while maintaining the correctness of the system and providing adequate performance.

A deadlock occurs when each task in a set of tasks is blocked waiting for a resource owned by
another task in the set. All following conditions must hold true simultaneously for a deadlock to exist:
  • Mutual exclusion: there are several resources that cannot be used by more than one task at a time.
  • Hold and wait: tasks already holding a resource may block waiting for another resource.
  • No preemption: resources cannot be forcibly removed from the task owning them.
  • Only the owning task can release resources.
  • Circular wait: there are several tasks forming a circular chain where each task waits on a resource owned by the next task in the chain.
Removing any of the above condition is sufficient to prevent deadlocks. Following should help you:
  1. Try to limit design to single mutex. This removes the mutual exclusion condition.
  2. Avoid acquiring more than one mutex/resource at a time and also avoid acquiring locks recursively. This removes the hold and wait condition.
  3. Always acquire mutexes in same order and release them in opposite order to prevent cycles. This removes the circular wait condition. Define and document lock ordering hierarchy.
  4. When ordering mutexes is not option, It is sometimes possible to avoid deadlocks by using “try” variants of the locking primitives that return an error if the lock is already held. This removes the hold and wait condition. The idea is to try to acquire the misordered lock. If the lock is acquired everything is fine. If the lock is already held some recovery action must be  taken, for example rolling back the shared resources to a consistent state, releasing all already held locks, and trying in-order acquisition.
  5. Strive to unlock before calling unknown code.
  6. Avoid invoking client-provided callbacks while holding locks.
  7. Document callback calling context.
  8. Beware of race conditions when evaluating the wake-up condition.
So by now you are aware of the basket and the fruit flavors it has. So satiate yourself with the fruit you desire.


I would say I am not the original writer to this information. I would like to thank the under link

http://linuxburps.blogspot.in/p/blog-page.html 


No comments:

Post a Comment