Software Caching

If you came to this page expecting to learn about the caches built into most memory hardware, see Memory Caches instead.

1 Definition and Motivation

A software-based cache stores the results of previous queries and uses it to respond to subsequent queries.

A pure function’s computation depends only on its arguments, not on any form of state. Caching the results of a pure function is often called memoization and can change the asymptotic performance of some functions, notably making repetative forked recursion non-exponential in runtime. While an important technique in algorithm design, memoization is not the topic of this page.

Caches can be used to make repeated queries less costly, generally in runtime but often also in other resources such as power usage, communication bandwidth, and so on. They do this by storing more information, hence creating a space-time trade-off (use more space to save time).

Caches can be kept in transient in-memory storage or durable on-disk storage. In-memory storage is easier to implement, but on-disk storage can be preferable both to avoid filling up memory with the cache and to provide the benefits of the cache across multiple executions of the caching application. On-disk caches can also provide a partial history of an application’s behavior and are sometimes inspected during digital forensics to discover evidence of past activity.

For non-pure functions which depend on some kind of state (most notably the function fetch the website at this URL which depends on the state of the website on the server), a key challenge of caches is deciding when it is safe to use the cache vs when it is necessary to re-run the function. In HTTP, several ways of making these when-to-cache decisions are supported using the Cache-Control: header.

2 Age-based cache control

Age-based caching treats each cache entries as either fresh or stale. Fresh entries may be returned from the cache; stale entries need to be run in full and then updated in the cache. New entries are always created as fresh but become stale after a pre-determined time.

Suppose at 2024-04-01 11:30:00 CDT my browser made an HTTP request to http://example.com/aged and the response contained the header Cache-Control: max-age=340 (that’s measured in seconds) and the message body example1. The browser would then insert into the cache something like

Key Expires Body
http://example.com/aged 2024-04-01 11:35:40 CDT example1

If at 2024-04-01 11:32:31 CDT I asked the browser for http://example.com/aged again It would check the cache, verify that the entry had not expired, and return example1 without going online at all.

If at 2024-04-01 11:37:15 CDT I asked the browser for http://example.com/aged again It would check the cache, see that the entry had expired, and make a new HTTP request, updating the cache entry with the new response and a new expiration time.

Age-based cache control is easy to implement but it hampers the ability of the server to make short-notice changes. If I set my page’s max-age at 24 hours then I can’t guarantee that any changes I make to that page are seen until 24 hours after I make them.

2.1 Picking a key

How much of a request should be checked against the cache? The path and host? Request headers? The request body?

HTTP supports a Vary: header that identifies request headers that should be part of the key used in cache lookups. For example, if my page has been translated into several languages and I pick which one to return based on the request’s Accept-Language: header then I’d add Vary: Accept-Language to my responses, telling the cache to use both the URL and the Accept-Language request header as the key for caching.

2.2 Partial-contact age-based cache control

Age-based caching is usually handled on the client side by design: we are trying to reduce the number of network requests. However, we can move it to the server by adding an If-Modified-Since: header to the request. If the server knows that nothing has changed on its side since the date provided in that header it can respond with a 304 Not Modified response code and no additional data, saving the server the time needed to generate a full response and possibly also saving the client the time needed to re-parse the response if the client put the parse response in its cache.

3 Tag-based cache control

Tag-based caching puts the control of the caching in the hands of the server. It also allows multiple versions of the same page to be stored in the cache at the same time. As a side effect, it allows fingerprinting and user tracking.

The operation of tag-based caching can be seen in the following analogy

Agent X is responsible for executing part of a complicated plan devised, and changed from time to time, by HQ. Communications with HQ are of course encrypted, but they also want to reduce the amount of sensitive messages they send.

On the first day on the job, Agent X sends to HQ a simple message: What’s the plan? HQ sends back a detailed plan, labeled plan 34b.

On the second day, Agent X wants to know if the plan changed, and thus sends to HW the message What’s the plan? I already know plan 34b. HQ sends back a very shot message, just Plan 34b. Agent already knows plan 34b so no additional details are needed.

On the third day, Agent X wants to know if the plan changed, and thus sends to HW the message What’s the plan? I already know plan 34b. HQ sends back a new detailed plan, labeled plan 40h.

On the fourth day, Agent X wants to know if the plan changed, and thus sends to HW the message What’s the plan? I already know plans 34b and 40h. If one of those is HQ’s current plan, HW will simply tell Agent X that; if not, HW will sent Agent X yet another detailed plan with a new name.

The key idea of a tag-based cache control system is that each distinct response is associated with a unique identifier. HTTP calls that identifier an etag. A request can list the tags the client already knows about using the If-None-Match: header; a response can tell the client to use one from its cache with a 304 response code and can label the etag of a response with the ETag: header.

The same sequence of Agent X/HQ messages as above might look like

  1. X to HQ:

    GET /plan HTTP/1.1

    HQ to X

    HTTP/1.1 200 OK
    ETag: "34b"
    
    Here's the plan: ...
  2. X to HQ:

    GET /plan HTTP/1.1
    If-None-Match: "34b"

    HQ to X

    HTTP/1.1 340 Not Modified
    ETag: "34b"

    (aside: because there was only one etag in the If-None-Match header, the ETag header could be omitted)

  3. X to HQ:

    GET /plan HTTP/1.1
    If-None-Match: "34b"

    HQ to X

    HTTP/1.1 200 OK
    ETag: "40h"
    
    Here's the plan: ...
  4. X to HQ:

    GET /plan HTTP/1.1
    If-None-Match: "34b", "40h"

    HQ to X

    HTTP/1.1 340 Not Modified
    ETag: "40h"

3.1 ETags as a way of tracking users

ETags are a great way of reducing network bandwidth and server processing time, but they also allow servers to track clients.

HTML pages typically contain references to various external files: images, stylesheets, scripts, fonts, and so on. If the server picks one of those that never changes but generates a unique ETag each time it is requested, that ETag can serve as a kind of client identification.

The following partial aiohttp method shows how to use an ETag to track users

etag = request.headers.get('If-None-Match')
if etag: etag = etag.split(', ')[0].strip('"')
if etag and etag in users:
  # note a new visit from users[etag]
  status = 304
else:
  etag = ... # make a new user identifier
  users[etag] = ... # track a new user
  status = 200
return Response(body=..., status=status, headers={'ETag':etag})

One of the most common ways to use etag tracking is to make a 1-pixel transparent PNG image that every page accesses, using its etag to identify users. That, however, is not required: and file that all pages will request will work, even if it contains meaningful data. It’s even possible to send each user a file with different contents, allowing the pages to use the content they receive to display user-specific content, though more commonly the tracking is invisible to the user.