This is an archived copy of a previous semester's site.

Please see the current semester's site.

Memory Management

1 Globals

Global memory is not managed. It is allocated once when the program is first being loaded and never changes size thereafter.

Global memory is split into several regions, notably including:

2 The Stack

The stack is automatically managed by the compiler and is linked to function calls. It exists in a large segment that is set aside for the program is first being loaded, though most of that segment is initially unused. Each new function’s activation record (its arguments, return value, return address, and local variables) is pushed on the stack when the function is invoked and popped off the stack when the function returns. After a function returns, addresses of its local variables still exist, but using them is problematic because that memory will be re-used by the next function to be called.

The stack has two common bugs:

3 Heap, level 0: sbrk

The segment of memory storing the heap is initially small or even empty. The beginning of that segment is fixed, but the end of that segment can be changed by the operating system. How we ask the OS to change that end-of-heap-segment marker varies by OS, but a common one is sbrk which accepts a delta change in the number of bytes to allocate for the heap.

When using sbrk, the heap is just a big continuous range of addresses. Any internal structure and usage is up to you. One of the most popular internal structures is managed by malloc.

4 Heap, level 1: malloc and friends

malloc and its related functions free, calloc, and realloc use sbrk to request heap memory from the OS, then add their own bookkeeping to the big region of available memory, thus chopping it into smaller pieces that can be individually managed and deallocated.

There are several implementations of malloc, but they generally share the following features:

Languages like C++ often wrap malloc in an operator like new which does two things: first, it mallocs space for whatever it is creating; then it runs a function called a constructor to initialize the contents of that memory. Likewise, delete wraps free with a destructor before the free. That said, they C++ has its own implementation of malloc that manages the heap and its bookkeeping differently than malloc does, so the two do not play well with one another.

5 Heap, level 2: tracking lifetimes

Memory management with malloc-like functions is a source of many programmer errors. Three are particularly common:

To avoid these errors, most malloc-based code uses a notion of lifetimes to track when a given pointer should be freed. For example, the lifetime of a BST node might be from the moment of its allocation to the moment where its parent node points to something else instead of it; at that end of life time it should be freed. Often, these lifetimes are tracked only in the programmer’s mind, but two common techniques track it automatically:

6 Heap, level 3: garbage collection

Garbage is defined as allocated heap memory that will never be used by the program again. Unreachable memory is a subset of garbage that cannot be used again because no pointers to it are still available to the program. Garbage collectors are tools that search through the registers and stack to find what pointers do exist, then through the heap to find allocated memory that isn’t reachable from those pointers, then frees all such memory.

Garbage collectors take time to run (they have to read the entire stack and skim much of the heap) and add extra constraints to how memory is used (for example, pointers cannot be compressed or encrypted). However, they make it much easier to write code because deallocation and lifetimes needn’t be considered.

Garbage collectors are built in to most high-level languages, including Java, Python, JavaScript, Bash, and so on. These languages are characterized by having a way of allocating memory and creating objects, but no explicit way of deallocating them because deallocation is handled by the garbage collector instead. There are multiple designs of garbage collectors, and which one a language uses can have a noticeable impact on what kinds of code are slowed down by the garbage collector the most.

The languages I know that do not have mandatory language-level garbage collectors1 Here I’m treating language-automated reference counting as a form of garbage collection, though that’s not always the right way to think of it. are: