Asynchronous single-threaded code

When considering code performance, it is common discuss what a given operation is bound by, meaning the part of its operation takes the vast majority of its running time. While there’s not official set of things that a process can be bound by, three kinds of bounds are commonly referenced:

In this page we discuss a programming model commonly used for single-computer I/O-bound optimization, sometimes called async, async/await, promises, futures, coroutines, or cooperative multithreading; and Python’s particular implementation of this model.

Amdahl’s Law

In 1967 Gene Amdahl, presented an intuitive but easily-forgotten principle that later took his name.

This principle guides many ideas in computing, including computer hardware design, heterogenous computing, profile-guided optimization, and the async programming that is the subject of this page.

1 Initial example

Consider a function like this:

def useFile1(filename):
    prep_work()
    data = read_from(filename)
    return more_work(data)

Reading from a file is an I/O operation. Like many I/O operations it typically takes several million clock cycles to complete. However, almost all of that time is waiting: we ask the kernel to send a request to the disk, and then quite some time later the kernel tells us that the disk has done its job. We could write that more explicitly:

def useFile2(filename):
    prep_work()
    ticket = request_data_from(filename)
    wait_for(ticket)
    data = get_results_of(ticket)
    return more_work(data)

The thing I’ve named ticket above is an identifier for this specific request. While this program might only be making one request, the OS and hardware are generally handling several concurrently. In synchronous I/O libraries like open and read the ticket is hidden from the user, but it still exists.

What if this single application wants to access many files? A straightforward approach would be to do something like

for filename in list_of_files:
    print(useFile2(filename))

However, that will wait for the first file, then for the second, and so on in series. Often I/O devices can process several requests at the same time, so this is inefficient.

It would be nicer to send out all the requests, then wait for the first one to finish. That would presume a model where the function is split into two parts:

def useFile3_start(filename):
    prep_work()
    ticket = request_data_from(filename)
    return ticket

def useFile3_end(ticket):
    wait_for_ticket(ticket)
    data = get_results_of(ticket)
    return more_work(data)

tickets = []
for filename in list_of_files:
    tickets.append(useFile3_start(filename))

for ticket in tickets:
    print(useFile3_end(ticket))

This will be faster because all of the I/O tasks can begin before we wait for any of them, but it’s still less efficient than we’d like. Different ticket take different amounts of time to finish, and this code could end up waiting on the slowest ticket first, slowing down all the others.

Instead of waiting for tickets one at a time, we want to be able to wait for the first-to-complete from the entire set:

def useFile4_start(filename):
    prep_work()
    ticket = request_data_from(filename)
    return ticket

def useFile4_end(ticket):
    data = get_results_of(ticket)
    return more_work(data)

tickets = []
for filename in list_of_files:
    tickets.append(useFile4_start(filename))

while len(tickets) > 0:
    ticket = wait_for_one_of(tickets)
    tickets.remove(ticket)
    print(useFile4_end(ticket))

This code is finally efficient: all I/O happens concurrently and the first to finish can be handled without needing to wait for anything else. Various operating system libraries help us do this, supplying several different implementations of the wait for one of function such as select, poll, and epoll.

Although efficient, this code is not very pleasant to write. Single conceptual actions like useFile are split across multiple functions, with the programmer writing that code needing to know where to split the work and the programmer using the code needing to know how to invoke each part and what other functions to use in between. It can be done, but it’s tedious and error-prone.

We want to be able to write functions that look like useFile1 but run as efficiently as useFile4. There have been various solutions proposed to achieve this over time, but the one that’s currently gaining traction as the emerging leader is called an async function. We’d use it like so:

import asyncio

async def useFile5(filename):
    prep_work()
    data = await read_from_file(filename)
    return more_work(data)

async def useSeveralFiles():
    pending = []
    for filename in list_of_files:
        pending.append(useFile5(filename))
    for promise in asyncio.as_completed(pending):
        print(await promise)

asyncio.run(useSeveralFiles())

Explaining how this last example works, and other ways it could be written, is the subject of the rest of this page.

2 Coroutines

A coroutine is a special function-like object that can be in several states:

Notably not included in this list is waiting: a design goal of coroutines is that any time you’d expect code to wait, the coroutine instead stops running and marks itself as not ready to run until the thing it would wait for concludes.

You define a coroutine function by putting async in front of its def keyword; when invoked, such a function returns a coroutine instead of:

def a():
    """this is a regular function;
    it returns a value when invoked."""
    print('a running')
    return 340

async def b():
    """this is a coroutine function;
    it returns a coroutine when invoked;
    that coroutine contains a value when it is finished"""
    print('b running')
    return 340

val = a() # prints 'a running'
cr = b() # does not print; makes a coroutine without running it
print(type(a), type(b)) # <class 'int'> <class 'coroutine'>

There are three ways to run a coroutine (named cr in the example invocations below):

  1. From outside any coroutine, we can use asyncio.run(cr) to run a coroutine until it is finished, retrieving its value. If the internal coroutine goes into a not ready state, the program waits idly until it is ready and then resumes it.

    This is inefficient and should be used as little as possible, most commonly only once in an entire application.

    Until you invoke asyncio.run, no coroutine will run.

  2. From inside another coroutine, we can use await cr to run a coroutine until it is finished, retrieving its value. If the internal coroutine goes into a not ready state, so does the one awaiting its result.

  3. From inside another coroutine, we can wrap it in an asycio.Task using asyncio.create_task(cr) which will start it running soon. In particular, any time an awaited or asyncio.running coroutine moves into a not ready state, a task with a ready coroutine will be run instead.

There are also various other functions inside the asyncio library that wrap several coroutines into one or the like, all of which could be implemented using the above tools but are convenient for particular situations.

The following code defines two simple coroutine functions and uses await and asyncio.run to execute them. It’s not doing anything that benefits from having coroutines, but it does show several of their basic properties.

import asyncio

async def number():
    return 340

async def use_number():
    a_coroutine = number()
    print(a_coroutine) # prints <coroutine object number at 0x7f8157956030>
    a_number = await a_coroutine
    print(a_number)    # prints 340

tmp = use_number()
print(tmp) # prints <coroutine object use_number at 0x7f81579567a0>
print(asyncio.run(tmp))      # prints None
print(asyncio.run(number())) # prints 340

The full output will look something like

<coroutine object use_number at 0x7f81579567a0>
<coroutine object number at 0x7f8157956030>
340
None
340

3 Working with Files

Most operating systems have some support for asynchronous file operations, but these tend to have various OS-specific limitation, often including some limitation on how well they scale. This is improving over time, but for now2 version 3.12 Python has decided not support asynchronous files directly with asyncio.

That said, asyncio does have a way to wrap a blocking operation like file access in a coroutine; it’s not as efficient as proper async, but it’s more efficient than normal blocking open and read. This backup technique uses a thread (a much heavier-weight concurrency tool than a normal coroutine) to make a regular function act like an async function. The function asyncio.to_thread applied to a function returns a coroutine.

When using asyncio.to_thread, five the function itself, not an invocation of the function: cr = asyncio.to_thread(foo) not cr = asyncio.to_thread(foo()).

Assuming this is run from a folder that contains files named file1, file2, and file3, each with an integer inside, this example will print the contents of those files in nondeterministic order.

import asyncio

def useFile(filename):
    filename = './'+filename
    data = open(filename).read()
    return int(data)

async def useSeveralFiles(list_of_files):
    pending = []
    for filename in list_of_files:
        pending.append(asyncio.to_thread(useFile, filename))
    for promise in asyncio.as_completed(pending):
        print(await promise)

asyncio.run(useSeveralFiles(['file1','file2','file2']))

This same approach can be used for other non-async I/O-bound operations, such as displaying with print and reading from the keyboard using input.

4 Working with Networks

Unlike files, most operating systems have robust implementations of asynchronous network communication.

Networking is complicated because typically computers have a single network connection (an Ethernet cable or wireless antenna) that is used both to send and receive messages for all applications. Each arriving message needs to be routed to the correct process (we don’t want the web browser to get messages intended for a secure chat client, for example) which means that the kernel has to be able to read enough of each arriving message to know what process to send it to. This tends to mean that the communication with the kernel to establish a network connection is fairly high.

What the kernel needs to know to send a message is

  1. How you’re identifying a recipient computer. We’ll only use IP, the system used on the Internet, but there are other connection types.
  2. How you’re formatting messages, so it can know where to look when routing replies to you. We’ll only use TCP, the system used on the Internet, but there are other formats.
  3. What IP address to connect to. This identifies a physical computer somewhere on the Internet.
  4. What port to connect to. This is TCP’s way of picking which process on a computer should get the message.

With this information it does some hidden work to find that computer and see if it will accept the connection. If it succeeds, the kernel gives you a way to send and receive messages, which looks to your code almost exactly like files with the same kind of read and write operations.

If you want to be a server that waits for other computers to contact you, the steps are a bit different.

  1. What IP addresses you’re OK responding to. Sometimes several IP addresses reach the same computer, and the kernel can route them to different processes.
  2. What port you expect to be contacted on.

The kernel uses this to set up a listening connection; when a new computer connects it creates a different port for that connection: this allows your application to speak with the connected client on the new port while still waiting for other clients to connect on the original port.

C

In C, a network connection is called a socket. The first two steps (shared by both clients and servers) are done through the socket function. The client connection is set up using connect, while the server via bind, listen, and accept. In all cases, when you are done you close the connection.

We have a separate page on these functions if you are interested.

asyncio wraps up all of these steps in two simple functions:

This code shows a simple server and client talking to one another. Normally the client and server would be in separate programs running on separate computers, but that’s not required.

The IPv4 address 127.0.0.1 and IPv6 address ::1 are special addresses that mean myself, this computer.

import asyncio

async def server_chatter(reader, writer):
  while not reader.at_eof():
    msg = (await reader.read(80)).decode('utf-8')
    print("Server got",len(msg),"bytes")
    res = "Why "+repr(msg)+"?\n"
    writer.write(res.encode('utf-8')) # queues for writing
    await writer.drain() # completes queued readings

async def tell_server(ip, port, statement):
  reader, writer = await asyncio.open_connection(ip, port)
  writer.write((statement).encode('utf-8'))
  await writer.drain()
  msg = (await reader.readline()).decode('utf-8')
  return msg

async def main():
  server = await asyncio.start_server(server_chatter, "::1", 34043)
  print('>', await tell_server("::1", 34043, "Hello!"))
  print('>', await tell_server("::1", 34043, "I hope you are well."))
  print('>', await tell_server("::1", 34043, "Goodbye!"))
  
  # async with server: # uncomment to let other apps contact this server
  #   await server.serve_forever()

asyncio.run(main())

5 async for and with

Python provides two additional control constructs for working with coroutines.

async with lets us use with on something that works with coroutines. with is roughly a fancy way to call two functions, one before and one after entering the indented block, and is a nicer way to handle several common patterns that try/finally. When those two functions are actually coroutines, async with will await both of them.

async for lets us use for on something that uses coroutines to generate results, most commonly a coroutine function that uses yield to make itself into an async generator instead of return to make itself into a coroutine.

This code uses an async generator to convert a server’s reader into a sequence of integers and async for to access that sequence.

import asyncio

async def reader_of_ints(reader):
  """a sequence of integers that were followed by spaces"""
  while not reader.at_eof():
    try:
      yield int(await reader.readuntil(b' '))
    except ValueError as ex: # not an int
      print('\t!',ex)
    except asyncio.IncompleteReadError as ex: # ended with no space
      print('\t! leftover:', ex.partial)
      break

async def extract_ints(reader, writer):
  found = 0
  async for num in reader_of_ints(reader):
    found += 1
    print('\tinteger',found,'is',num)
  writer.write(f'{found} total integers received\n'.encode('utf-8'))
  await writer.drain()

async def tell_server(ip, port, statement):
  reader, writer = await asyncio.open_connection(ip, port)
  writer.write((statement).encode('utf-8'))
  writer.write_eof() # manually claim the end of input so generator ends
  msg = (await reader.readline()).decode('utf-8')
  return msg

async def main():
  server = await asyncio.start_server(extract_ints, "::1", 34043)
  print('>', await tell_server("::1", 34043, "Hello!"))
  print('>', await tell_server("::1", 34043, "124 128 173 225 222 340"))
  print('>', await tell_server("::1", 34043, "2 for me, 1 for you"))

asyncio.run(main())