Synchronization manages the timing of concurrent operations, typically by making one or more operations wait until some condition is met by another operation.

Synchronization of threads and shared-resource processes has been a topic of study in computer science at least since the 1960s, and many strategies for synchronizing have been designed, implemented, and studied. These broadly fall into two categories: synchronization primitives and atomic operations. This page gives and overview of the most common components of each.

1 Atomic Operations

Atomic operations are a key concept in lock-free data structures, a family of tools for high-efficiency synchronization in the case when multiple threads contending for a single resource is rare. Atomic operations resemble other assembly instructions, but they have an additional guarantee that all of their work will happen without a separate thread accessing the same resources. Any operation or set of operations that guarantees this kind of exclusive access to its resources during its operation is called atomic. Atomic operations are quite a bit less efficient than their non-atomic counterparts and should be used only when atomicity is required.

Common atomic operations include

Test-and-set, a single-bit operation that (a) sets a bit in memory to 1 and (b) returns if that bit was changed by this operation (i.e. previously 0) or not.

This is one of the simplest of the atomic operations to implement in hardware, and many other atomic operations and synchronization primitives are built off of it. I’ve never used it outside of a class that describes how to implement synchronization, but it’s refered to with enough frequency to be worth knowing about.
Compare-and-set, a multi-bit version of test-and-set which only modifies a value if it had a specific previous value before the modification; in other words, something like
```
if (p[0] == oldvalue) { p[0] = newvalue; return 1; }
else { return 0; }
```
Compare and set is widely used to implement lock-free data structures, such as linked lists that can be modified by multiple threads concurrently. The core idea of such approaches is something like
1. copy the current data into local variables
2. compute the new values and pointers needed
3. compare-and-set the new values over the old ones
4. if the compare-and-set failed, restart the operation
There are many nuances to this idea, but the core of it (retry on failure) shows why atomic operations are best for situations where threads fighting over a single resource are rare.
Atomic modification operations, like +=.

Under the hood, x += 1 involves (a) loading x’s current value from memory, (b) computing the sum of that value and 1, and (c) writing the result back to memory. With a non-atomic version, there is the potential that some other thread will modify x’s memory after step (a), then have its work lost when (c) overwrites that memory. With an atomic version, that kind of race condition is prevented by preventing any other thread from accessing x’s memory between steps (a) and (c).

2 Synchronization primitives

All synchronization primitives have the following properties:

They pair some opaque data (i.e. you never look inside it) and some functions or methods that operate on that data.
They define rules under which some threads will block when running a function/method, being suspended by the OS until some other function/method on the same data in another thread unblocks it.

There are many synchronization primitives; the following are those I see the most often in code I read and write.

2.1 Lock

A lock has two states: locked and unlocked. It also has two functions/methods: acquire and release. Acquire changes the lock from unlocked to locked, blocking until that can be done. Release changes the lock from locked to unlocked.

	locked	unlocked
acquire	block	change to locked
release	change to unlocked resume any threads blocked on this lock’s acquire	Error

The most common use of locks is to implement mutual exclusion: a chunk of code (called a critical section) that only one thread can access at a time. This tends to look like

lock.acquire()

do things
using only one thread
at a time

lock.release()

Releasing after doing the work has potential problems if one of the things in the critical section can crash, so it’s often actually written as

lock.acquire()
try:
    do things
    using only one thread
    at a time
finally:
    lock.release()

The finally-based pattern is so common that it often has its own language-level syntax, such as in Java

synchronized(lock) {
    // do things
    // using only one thread
    // at a time
}
// lock has been released here

or in Python

with lock:
    ... # do things
    ... # using only one thread
    ... # at a time
# lock has been released here

Bathroom stalls act like locks: when you acquire it (try to enter the stall) you either get exclusive access to it until you release it (leave the stall) or you block (wait) until it’s available.

The primary function of a lock is to provide mutual exclusion: a thread that acquires the lock excludes others from doing so in a symmetric (mutual) way. Because of this, a lock is often called a mutex.

2.2 Condition Variable

A condition variable is like a lock with a special temporary-release mode. It has acquire and release function/methods just like a lock, but also wait, notify, and notify_all methods

wait is like a temporary release: it releases the lock, then blocks until both (a) notified and (b) can reacquire.

	locked	unlocked
`acquire`	block	change to locked
`release`	change to unlocked resume any threads blocked on `acquire`	Error
`wait`	`release` block until `notify`, then `acquire`	Error
`notify`	unblock one thread blocked on `wait`	unblock one thread blocked on `wait`
`notify_all`	unblock all threads blocked on `wait`	unblock all threads blocked on `wait`

This may seem strangely complicated, but it allows a particular pattern that is very useful, namely locking until some arbitrarily-complicated condition is met:

cv.acquire()
while not complicated_condition_is_met():
    cv.wait()

do things using only one thread at a time
that change the complicated condition for some thread

cv.notify_all()
cv.release()

Because the acquire/release pair is essential, it is often facilitated with try/finally or a language-level syntax like Python’s:

with cv:
    while not condition_met():
        cv.wait()
    ... # do things
    ... # using only one thread
    ... # at a time
    cv.notify_all()
# lock has been released here

notify or notify_all?

Using notify is generally more efficient than notify_all, but if several threads are waiting and only a few of them will be enabled by the notification then it can also cause deadlock, with the randomly-selected thread that is unblocked not being able to make progress and all ending up blocked. There’s much more to say here, but for a first course in synchronization it’s probably safer to use the notify_all version unless you are sure notify will work.

I sometimes eat at a buffet where each dish has a condition variable-like interface. Only one person can access a given dish at a time (making it like a lock), but if that dish is empty they can wait for it to be refilled. Workers can also refill the dishes.

In other words, as a customer with a specific food I like I run this:

food_station.acquire()
while not food_station.has_food():
    food_station.wait()
food_station.serve(me)
food_station.notify()
food_station.release()

The worker who refills the food uses it more like a lock, kicking off this process periodically as more food is cooked and ready to serve:

food_station.acquire()
food_station.refill()
food_station.notify()
food_station.release()

Some implementations of condition variables treat them as two elements: a standard Lock that handles the acquire and release and a separate condition variable that handles the wait, notify, and notify_all. Others (including Python and this page) treat the lock as a superclass or field of the condition variable, recognizing that the split-type condition variable does not make sense without a corresponding lock.

One particular way of using a condition variable, called a monitor, ties the various condition variable operations to changes in attribute values of an object, creating one type of synchronized, thread-safe class.

2.3 Reader-writer lock

A reader-writer lock can be locked in two ways. If acquired in exclusive or write mode, it acts just like a regular lock. If acquired in shared or read mode, any number of threads can acquire it simultaneously, but all must release it before it can be acquired in exclusive/write mode.

Reader-writer locks are more common for file systems than for threads, where the file itself acts as the data you lock on. In both C (via #include <sys/file.h>) and Python (via import fcntl) the flock function takes two arguments: the file to lock or unlock and an enumeration value that is one of

LOCK_UN to unlock
LOCK_EX to lock in exclusive mode, blocking if any other thread has this locked in any mode
LOCK_SH to lock in shared mode, blocking if another thread has this locked in exclusive mode

Reader-writer locks can be implemented for threads using a condition variable and a count of the number of readers holding the lock. Many reader-writer locks prioritize writers over readers by also having a count of the number of writers waiting for the lock. An example of such an implementation can be found on Wikipedia.

Museums operate a kind of reader-writer lock. Visitors acquire the lock in shared mode when they arrive and release it when they leave, allowing any number of visitors to browse the exhibits at the same time. The curator acquires the lock in exclusive mode when they want to set up a new exhibit and release it when the exhibit is ready, ensuring that visitors can’t get in the way of the exhibit installation.

2.4 Semaphore

Semaphores were among the first synchronization primitives to be used in computing. They come in two types (binary and counting).

Binary semaphores are used as a type of communication between threads: one thread blocks on it, then later another thread unblocks it.

Counting semaphores are used like locks that can be acquired several times before they block.

I’ve never used either type of semaphore outside of low-level systems code like implementing operating-system functionality.

2.5 Barrier

A barrier acts in some ways like the opposite of a lock. A lock blocks if any other threads have acquired it; a barrier blocks until several threads are trying to pass it, then lets them all through at once.

Barriers are created with a count of the number $n$ of threads needed to pass the barrier and have just one function: wait, which blocks the first $n-1$ threads that call it and then lets them all unblock when the $n$ th thread calls wait.

Barriers are less versatile than some other synchronization primitives, but very helpful in specific situations. Two specific situations I’ve seen barriers used in repeatedly are handling a setup phase before a work phase and collecting intermediate results before deciding what to do next.

Setup phase barriers look like this:

b = Barrier(n)
have n threads each do:
    setup
    b.wait()
    do work

In this design, all threads will finish their setup before any of them start working, ensuring that no work messages are dropped because they are sent to a thread that’s not set up yet or the like.

Intermediate result barriers look like this:

b = Barrier(n)
have n threads each do:
    do part of the job
    b.wait()
    if thread 1:
        look at preliminary results
        make a decision what to do next
    b.wait()
    do rest of job

In this design all threads finish the first part, then that completed first part is used by one thread, and only once that thread is done do the threads pick up with the next part of their work.

There was a ride at the amusement park near my childhood home that operated like a barrier. Each car sat four people, and the workers would only let a car go once it was full. Once there was no line and I was the second person to arrive and sit in a car; the two of us waited until two more people arrived before the worker let our car go.