When considering code performance, it is common discuss what a given operation is bound by, meaning the part of its operation takes the vast majority of its running time. While there’s not official set of things that a process can be bound by, three kinds of bounds are commonly referenced:
Compute-bound or processor-bound operations spend most of their time doing arithmetic in the processor. We typically optimize these by using tools like parallelism, approximation, and in some cases purpose-build processor hardware.
Memory-bound operations spend most of their time looking through memory. We typically optimize these by using tools like caching, indexing, replicating, parallelism, and in some cases redesigned memory hardware.
I/O1 I/O stands for input/output
and most often refers to anything that accesses either the disk or the network. There are many other I/O devices (mice, keyboards, cameras, microphones, screens, speakers, printers, and so on) but disk and network are by far the most discussed.-bound operations spend most of their time waiting for input to arrive or output to be delivered. Single-computer optimizations of these tend to focus on effectively managing many simultaneous I/O connections; larger-scale optimizations focus on splitting the requests between multiple computers with their own I/O channels.
In this page we discuss a programming model commonly used for single-computer I/O-bound optimization, sometimes called async
, async/await
, promises
, futures
, coroutines
, or cooperative multithreading
; and Python’s particular implementation of this model.
In 1967 Gene Amdahl, presented an intuitive but easily-forgotten principle that later took his name.
This principle guides many ideas in computing, including computer hardware design, heterogenous computing, profile-guided optimization, and the async programming that is the subject of this page.
Consider a function like this:
def useFile1(filename):
prep_work()= read_from(filename)
data return more_work(data)
Reading from a file is an I/O operation. Like many I/O operations it typically takes several million clock cycles to complete. However, almost all of that time is waiting: we ask the kernel to send a request to the disk, and then quite some time later the kernel tells us that the disk has done its job. We could write that more explicitly:
def useFile2(filename):
prep_work()= request_data_from(filename)
ticket
wait_for(ticket)= get_results_of(ticket)
data return more_work(data)
The thing I’ve named
above is an identifier for this specific request. While this program might only be making one request, the OS and hardware are generally handling several concurrently. In synchronous I/O libraries like ticket
open
and read
the ticket is hidden from the user, but it still exists.
What if this single application wants to access many files? A straightforward approach would be to do something like
for filename in list_of_files:
print(useFile2(filename))
However, that will wait for the first file, then for the second, and so on in series. Often I/O devices can process several requests at the same time, so this is inefficient.
It would be nicer to send out all the requests, then wait for the first one to finish. That would presume a model where the function is split into two parts:
def useFile3_start(filename):
prep_work()= request_data_from(filename)
ticket return ticket
def useFile3_end(ticket):
wait_for_ticket(ticket)= get_results_of(ticket)
data return more_work(data)
= []
tickets for filename in list_of_files:
tickets.append(useFile3_start(filename))
for ticket in tickets:
print(useFile3_end(ticket))
This will be faster because all of the I/O tasks can begin before we wait for any of them, but it’s still less efficient than we’d like. Different ticket take different amounts of time to finish, and this code could end up waiting on the slowest ticket first, slowing down all the others.
Instead of waiting for tickets one at a time, we want to be able to wait for the first-to-complete from the entire set:
def useFile4_start(filename):
prep_work()= request_data_from(filename)
ticket return ticket
def useFile4_end(ticket):
= get_results_of(ticket)
data return more_work(data)
= []
tickets for filename in list_of_files:
tickets.append(useFile4_start(filename))
while len(tickets) > 0:
= wait_for_one_of(tickets)
ticket
tickets.remove(ticket)print(useFile4_end(ticket))
This code is finally efficient: all I/O happens concurrently and the first to finish can be handled without needing to wait for anything else. Various operating system libraries help us do this, supplying several different implementations of the wait for one of
function such as select
, poll
, and epoll
.
Although efficient, this code is not very pleasant to write. Single conceptual actions like useFile
are split across multiple functions, with the programmer writing that code needing to know where to split the work and the programmer using the code needing to know how to invoke each part and what other functions to use in between. It can be done, but it’s tedious and error-prone.
We want to be able to write functions that look like useFile1
but run as efficiently as useFile4
. There have been various solutions proposed to achieve this over time, but the one that’s currently gaining traction as the emerging leader is called an async function.
We’d use it like so:
import asyncio
async def useFile5(filename):
prep_work()= await read_from_file(filename)
data return more_work(data)
async def useSeveralFiles():
= []
pending for filename in list_of_files:
pending.append(useFile5(filename))for promise in asyncio.as_completed(pending):
print(await promise)
asyncio.run(useSeveralFiles())
Explaining how this last example works, and other ways it could be written, is the subject of the rest of this page.
A coroutine is a special function-like object that can be in several states:
Notably not included in this list is waiting
: a design goal of coroutines is that any time you’d expect code to wait, the coroutine instead stops running and marks itself as not ready to run until the thing it would wait for concludes.
You define a coroutine function by putting async
in front of its def
keyword; when invoked, such a function returns a coroutine instead of:
def a():
"""this is a regular function;
it returns a value when invoked."""
print('a running')
return 340
async def b():
"""this is a coroutine function;
it returns a coroutine when invoked;
that coroutine contains a value when it is finished"""
print('b running')
return 340
= a() # prints 'a running'
val = b() # does not print; makes a coroutine without running it
cr print(type(a), type(b)) # <class 'int'> <class 'coroutine'>
There are three ways to run a coroutine (named cr
in the example invocations below):
From outside any coroutine, we can use asyncio.run(cr)
to run a coroutine until it is finished, retrieving its value. If the internal coroutine goes into a not ready
state, the program waits idly until it is ready and then resumes it.
This is inefficient and should be used as little as possible, most commonly only once in an entire application.
Until you invoke asyncio.run
, no coroutine will run.
From inside another coroutine, we can use await cr
to run a coroutine until it is finished, retrieving its value. If the internal coroutine goes into a not ready
state, so does the one await
ing its result.
From inside another coroutine, we can wrap it in an asycio.Task
using asyncio.create_task(cr)
which will start it running soon
. In particular, any time an await
ed or asyncio.run
ning coroutine moves into a not ready
state, a task with a ready coroutine will be run instead.
There are also various other functions inside the asyncio
library that wrap several coroutines into one or the like, all of which could be implemented using the above tools but are convenient for particular situations.
The following code defines two simple coroutine functions and uses await
and asyncio.run
to execute them. It’s not doing anything that benefits from having coroutines, but it does show several of their basic properties.
import asyncio
async def number():
return 340
async def use_number():
= number()
a_coroutine print(a_coroutine) # prints <coroutine object number at 0x7f8157956030>
= await a_coroutine
a_number print(a_number) # prints 340
= use_number()
tmp print(tmp) # prints <coroutine object use_number at 0x7f81579567a0>
print(asyncio.run(tmp)) # prints None
print(asyncio.run(number())) # prints 340
The full output will look something like
<coroutine object use_number at 0x7f81579567a0>
<coroutine object number at 0x7f8157956030>
340
None
340
Most operating systems have some support for asynchronous file operations, but these tend to have various OS-specific limitation, often including some limitation on how well they scale. This is improving over time, but for now2 version 3.12 Python has decided not support asynchronous files directly with asyncio
.
That said, asyncio
does have a way to wrap a blocking operation like file access in a coroutine; it’s not as efficient as proper async, but it’s more efficient than normal blocking open
and read
. This backup technique uses a thread (a much heavier-weight concurrency tool than a normal coroutine) to make a regular function act like an async function. The function asyncio.to_thread
applied to a function returns a coroutine.
When using asyncio.to_thread
, five the function itself, not an invocation of the function: cr = asyncio.to_thread(foo)
not cr = asyncio.to_thread(foo())
.
Assuming this is run from a folder that contains files named file1
, file2
, and file3
, each with an integer inside, this example will print the contents of those files in nondeterministic order.
import asyncio
def useFile(filename):
= './'+filename
filename = open(filename).read()
data return int(data)
async def useSeveralFiles(list_of_files):
= []
pending for filename in list_of_files:
pending.append(asyncio.to_thread(useFile, filename))for promise in asyncio.as_completed(pending):
print(await promise)
'file1','file2','file2'])) asyncio.run(useSeveralFiles([
This same approach can be used for other non-async I/O-bound operations, such as displaying with print
and reading from the keyboard using input
.
Unlike files, most operating systems have robust implementations of asynchronous network communication.
Networking is complicated because typically computers have a single network connection (an Ethernet cable or wireless antenna) that is used both to send and receive messages for all applications. Each arriving message needs to be routed to the correct process (we don’t want the web browser to get messages intended for a secure chat client, for example) which means that the kernel has to be able to read enough of each arriving message to know what process to send it to. This tends to mean that the communication with the kernel to establish a network connection is fairly high.
What the kernel needs to know to send a message is
With this information it does some hidden work to find that computer and see if it will accept the connection. If it succeeds, the kernel gives you a way to send and receive messages, which looks to your code almost exactly like files with the same kind of read and write operations.
If you want to be a server that waits for other computers to contact you, the steps are a bit different.
The kernel uses this to set up a listening connection; when a new computer connects it creates a different port for that connection: this allows your application to speak with the connected client on the new port while still waiting for other clients to connect on the original port.
C
In C, a network connection is called a socket
. The first two steps (shared by both clients and servers) are done through the socket
function. The client connection is set up using connect
, while the server via bind
, listen
, and accept
. In all cases, when you are done you close
the connection.
We have a separate page on these functions if you are interested.
asyncio
wraps up all of these steps in two simple functions:
(reader, writer) = await asyncio.open_connection("127.0.0.1", 34043)
creates a client connection to the server with IP address 127.0.0.1
listening on port 34043
.
await asyncio.start_server(myfunc, "127.0.0.1", 34043)
creates a server that lets clients connect on IP address 127.0.0.1
and port 34043
; each time one arrives, myfunc(reader, writer)
is invoked.
This code shows a simple server and client talking to one another. Normally the client and server would be in separate programs running on separate computers, but that’s not required.
The IPv4 address 127.0.0.1
and IPv6 address ::1
are special addresses that mean myself, this computer.
import asyncio
async def server_chatter(reader, writer):
while not reader.at_eof():
= (await reader.read(80)).decode('utf-8')
msg print("Server got",len(msg),"bytes")
= "Why "+repr(msg)+"?\n"
res 'utf-8')) # queues for writing
writer.write(res.encode(await writer.drain() # completes queued readings
async def tell_server(ip, port, statement):
= await asyncio.open_connection(ip, port)
reader, writer 'utf-8'))
writer.write((statement).encode(await writer.drain()
= (await reader.readline()).decode('utf-8')
msg return msg
async def main():
= await asyncio.start_server(server_chatter, "::1", 34043)
server print('>', await tell_server("::1", 34043, "Hello!"))
print('>', await tell_server("::1", 34043, "I hope you are well."))
print('>', await tell_server("::1", 34043, "Goodbye!"))
# async with server: # uncomment to let other apps contact this server
# await server.serve_forever()
asyncio.run(main())
Python provides two additional control constructs for working with coroutines.
async with
lets us use with
on something that works with coroutines. with
is roughly a fancy way to call two functions, one before and one after entering the indented block, and is a nicer way to handle several common patterns that try
/finally
. When those two functions are actually coroutines, async with
will await
both of them.
async for
lets us use for
on something that uses coroutines to generate results, most commonly a coroutine function that uses yield
to make itself into an async generator instead of return
to make itself into a coroutine.
This code uses an async generator to convert a server’s reader into a sequence of integers and async for
to access that sequence.
import asyncio
async def reader_of_ints(reader):
"""a sequence of integers that were followed by spaces"""
while not reader.at_eof():
try:
yield int(await reader.readuntil(b' '))
except ValueError as ex: # not an int
print('\t!',ex)
except asyncio.IncompleteReadError as ex: # ended with no space
print('\t! leftover:', ex.partial)
break
async def extract_ints(reader, writer):
= 0
found async for num in reader_of_ints(reader):
+= 1
found print('\tinteger',found,'is',num)
f'{found} total integers received\n'.encode('utf-8'))
writer.write(await writer.drain()
async def tell_server(ip, port, statement):
= await asyncio.open_connection(ip, port)
reader, writer 'utf-8'))
writer.write((statement).encode(# manually claim the end of input so generator ends
writer.write_eof() = (await reader.readline()).decode('utf-8')
msg return msg
async def main():
= await asyncio.start_server(extract_ints, "::1", 34043)
server print('>', await tell_server("::1", 34043, "Hello!"))
print('>', await tell_server("::1", 34043, "124 128 173 225 222 340"))
print('>', await tell_server("::1", 34043, "2 for me, 1 for you"))
asyncio.run(main())