Two things are concurrent if there exists a time when both have started and neither has concluded. Two things are occurring in parallel if there exists a time when both are actively occurring. Concurrent is a superset of parallel.

Concurrency allows software design organized around multiple conceptual agents, each with its own tasks and workflow. Parallelism adds runtime efficiency to concurrency, but does not otherwise expand the design space.

1 Kinds of concurrency

Some parallelism is all-but invisible to the programmer, being inserted by the compiler and/or hardware for better performance. Hardware-inserted parallelism includes instruction-level parallelism, out-of-order execution, and speculative execution. Compiler-inserted parallelism included vectorization and SIMD instructions. These are topics that will not be further discussed in this class.

1.1 Computers

The heaviest-weight concurrency is running on a separate physical computer. Coordination of work requires an explicit message-passing protocol.

Computer systems designed to work across several computers are commonly called distributed systems.

1.2 Processes

A large step down from computers are processes. A process runs its own code, accesses its own memory, and generally acts like its own entity separate from other processes; but it shares an operating system with other processes on the same computer. This means that communication between processes is as simple as a system call, speaking to the operating system which can then access a shared resource like a file or part of memory for the process.

1.3 Threads

A thread has its own execution context – the code it is running and values of local variables – but shares memory with all the other threads in the same process. The shared memory means coordination is easy, but so is accidentally messing up what other threads are trying to do. This is especially true because threads run either in parallel or with preemptive scheduling, both meaning that on thread 1 may change memory between any pair of thread 2’s instructions, or in some cases half-way through a single instruction executing.

Multithreaded programming is characterized by consciously choosing to limit behaviors that might interrupt other threads.

1.4 Fibers

A fiber is like a thread that cannot run in parallel with other fibers. A fiber runs until it voluntarily relinquishes control of the processor, at which point another fiber starts running. Because of this design, fibers are sometimes called cooperative multithreading; because there are many versions of cooperation, there are also many specific fiber variants and implementation components with names like co-routines, greenlets, generators, callbacks, futures, promises, async/await, and lazy closures.

Fibers are primarily used in applications which spend limited time processing and much more time waiting for disks and networks to send and receive packets of data. Because of their structure of how they yield control to other fibers they tend to be much easier to program with than threads, but only in application domains where there are natural points in the code to yield control.

2 Coding concurrency

2.1 Multi-computer concurrency

Applications that involve multiple computers generally communicate using sockets and network connections. These are large enough topics to deserve their own pages: network connections and sockets.

2.2 Multi-process concurrency

The lowest-level model of multi-process concurrency is the fork command which duplicates a process, making a copy of all of its memory and other state. One copy, called the parent process, is given the process identifier (more commonly called the PID) of the other process, called the child process.

The vast majority of uses of fork are followed immediately by the child process replacing its entire memory and other programming state with a fresh copy of a new program, using the execve family of commands. Because of this, there are various wrappers around this pair of system calls which fork-and-exec to start a child process running a different program.

Communication between processes is often handled by file-like objects called pipes. A pipe looks like a write-only file to one process and a read-only file to the other. Processes can also communicate using sockets.

It is generally the responsibility of the parent process to clean up after the child process finishes running. The wait system call allows a process to suspend itself until a child process terminates. If a child process has terminated but not been waited for yet, it is a zombie process: the operating system keeps information for it to return when next the parent process waits for it, but it will never run again.

A particularly common pattern is to have a parent process run a different program in a child process and then wait for it to complete. This pattern is encapsulated in the system function. Higher-level functions that also capture the stdout of the child process are available in some higher-level languages, like Python’s subprocess.run.

If the parent process terminates, the child process is killed. If a child wants to continue running past the conclusion of its parent process, or if it wishes to be abel to die without becoming a zombie, it can become a daemon and declare that for most practical purposes it has no parent. A common pattern is for a daemon to be created when a computer boots up or a user logs in that spends most of its time waiting for input to appear on some file, pipe, or socket; then process that one input and return to a waiting state. These types of daemon processes implement a wide variety of system services such as noticing that a new peripheral device was plugged into your computer or tracking changes in the available wifi signals. You likely have several hundred daemons running on your computer right now, most of which have been running for many days but have spent less than a second of cumulative time doing anything other than waiting for input.

2.3 Multi-thread concurrency

There are several threading APIs, but the most common has a parent thread spawn a child thread, giving it a function to execute; once that function returns the child thread terminates.

Spawning a thread in C

You’ll need to #include <pthread.h> and include the -pthread option in your gcc or clang command line invocation.

// assume we have a function defined with this signature:
// void *function_to_run(void *arg)

pthread_t tid;
void *argument = /* ... */
void *returnvalue;
pthread_create(&tid, NULL, function_to_run, argument);

// do one of the two following, but not both:
// pthread_detach(tid); // ignore the thread; OR
pthread_join(tid, &returnvalue); // wait for it to finish

Spawning a thread in Python

You’ll need to import threading.

There are two roughly-equivalent versions of how to spawn threads. One uses a function, the other uses a class method.

Function version:

# call function_to_run(arg1, arg1) in a new thread
t = threading.Thread(target=function_to_run, args=[arg1, arg2])
# t.daemon = True # to detach the thread; must come before t.run()
t.start()

t.join() # to wait for the thread to end (cannot use on daemon)

Class method version:

class MyThread(threading.Thread):
  def __init__(self, arg1, arg2):
    threading.Thread.__init__(self)
    self.val1 = arg1
    self.val2 = some_work_on(arg2)
  def run(self):
    ... # do work

t = MyThread(arg1, arg2)
# t.daemon = True # to detach the thread; must come before t.run()
t.start()

t.join() # to wait for the thread to end (cannot use on daemon)

Because threads share the same address space, they can easily communicate by setting and checking values in memory. However, doing that can create race conditions. In general a race condition is any time that the outcome of a program depends on the order in which different threads do different things. A common version of a race condition has the following structure:

Thread 1 checks some value in memory to decide what to do.
Thread 2 changes that value.
Thread 1 does the thing that the old value wanted it to do but that is no longer correct because of the change thread 2 made.

Suppose two threads are both transfering money using logic like this:

if fromAccount > transferAmount:
    fromAccount -= transferAmount
    toAccoubnt += transferAmount
else:
    showError("Insufficient funds")

Suppose fromAccount starts as 15 and both threads are transfering 10 from it. If both threads run line 1, both will decide 15 > 10 and then both run line 2, causing the fromAccount to go to −5.

Avoiding race conditions generally requires some form of synchronization between threads, where one thread waits for another to finish some task before beginning some other task. Synchronization of threads is a large enough topic to desrve its own page.

2.4 Non-parallel Concurrency

There are many forms of fiber-like concurrency, enough that a general discussion of this full topic is beyond ths scope of this course. Instead, we’ll mention just two versions that are present in Python and several related languages.

A generator is like a function that can return or yield a value multiple times. Each time it does so, the function is paused until it is next called. In Python, a function that contains the yield keyword returns a generator instead of a regular value when called; the generator can then be run to its next yield line using its .__next__() method, or can be used like a collection in a for loop.

The simplest generator is just an inefficient way of creating a list of values

def prereqs():
    yield 124
    yield 128
    yield 173
    yield 225

g = prereqs()
print(type(g)) # <class 'generator'>
print(g.__next__()) # 124
print(g.__next__()) # 128
for c in g:
    print(c) # 173, then 225

Generators can be infinite, yielding one value after another forever.

This one technically isn’t infinite and will eventually crash. It might run out of memory to append to the found list, or it might crash because **0.5 returns a float which has a maximum value of around 1.7e308.

def allprimes():
    yield 2
    found = [2]
    next = 3
    while True:
        cap = next**0.5
        for p in found:
            if p > cap:
                yield next
                found.append(next)
                break
            if (next % p) == 0: break
        next += 2

for p in allprimes():
    print(p) # prints 2, then 3, then 5, then 7, and so on
    if p > 340: break # to keep loop from running forever

Generators have entirely deterministic and explicit scheduling: they run from a .__next__() invocation until they hit a yield or the generator function terminates.

Async functions have partally-explicit scheduling. They run from when they are invoked until they hit an await, but once they hit await it is up to an async scheduler to decide what to do next and when, if ever, to resume the paused async function. In Python, async functions are defined with async def instead of def. Python has several related async schedulers in the asyncio library.

A common use-case for async functions it when a program expects to be waiting for any one of multiple input sources (files, networks, or users) to provide input. Async functions are a newer approach to this task; earlier approaches based on threads, event handlers, or operating-system-managed interrupts are still dominant and async functions often have more complicated interfaces than those older, better-established methods. However, async functions are generally at least as efficient as the other methods and tend to have less scope for concurrency-based bugs, so they are increasingly popular in newer applications.