Different programming languages have different ways of providing programmers with access to memory.
push
and system calls like sbrk
to find which addresses lie within a given segment and adjust segment size.malloc
abstract managing memory within segments into simple allocate/deallocate pairs. C operates here.new
merge allocating and initializing memory, a key to the Resource Acquisition Is Initiallization programming idiom. C++ mostly operates here.When a program wants to run code from an untrusted source, controlling that code’s access to memory is a key part of doing that safely. Often this means providing a simple, course-grained memory pool and requiring the untrusted code to manage it themselves. Because of this, it is relatively common to have to implement your own memory allocation tools on top of that underlying pool. Some applications also opt for no-malloc code to improve performance or to fit on resource-constrained embedded systems.
In this MP, we give you a big pool of memory to manage on your own. Your goal is to do so in a way that supports all of the behaviors of malloc
, free
, and realloc
without using those tools themselves and do so while minimizing memory used and maximizing runtime efficiency.
In the past, this course has had you implement malloc
using sbrk
. That was cool in that once you were done you had built the tool you use in your other C code. However, the debugger uses malloc
internally, so if your malloc
is buggy then the debugger doesn’t work. We want you to use debuggers, which are available for virtually every other programming task you’ll ever do, so we’ve changed it this semester to a version that lets you use the debugger.
mp4.zip
contains a very simple, space-inefficient solution as well as VS Code project files and tests. You will modify
allocator.c
– most of your work will go heremytest.c
– optionally, write your own test case hereWe provide the simplest functionally-correct implementation we know of. It treats the heap like a stack; on allocation it pushes a new block on top of the stack and simply ignores all deallocations.
#include "allocator.h"
#include <string.h> // for memcpy
static void *base; /// the smallest usable address
static size_t used; /// how many byes have been used
/** Called once before any other function here. *
* Argument is the smallest usable address. */
void allocator_init(void *newbase) {
= newbase;
base = 0;
used }
/** Called once before each test case. *
* Free any used memory and reset for the next test. */
void allocator_reset() { used = 0; }
/** Like malloc but using the memory provided to allocator_init */
void *mymalloc(size_t size) {
// simplest version pushes each block on a stack
void *ans = base + used;
+= size;
used return ans;
}
/** Like free but using the memory provided to allocator_init */
void myfree(void *ptr) {
// simplest version does nothing
}
/** Like realloc but using the memory provided to allocator_init */
void *myrealloc(void *ptr, size_t size) {
// simplest version is malloc-copy-free
if (!size) { myfree(ptr); return NULL; }
void *ans = mymalloc(size);
if (ptr && ans) {
(ans, ptr, size); // should be min(size, oldsize) but oldsize unknown
memcpy(ptr);
myfree}
return ans;
}
Keep the functional correctness of the provided allocator but make it use less memory without adding too much time. That will probably involve at least the first size of the following steps. The steps in the following subsections are ordered such that earlier items are easier to implement than later ones and also are either have more impact on memory use than later ones or are required components of later ones. Going in order is strongly recommended.
As you walk through these steps, you will find yourself doing something in step and then partially undoing it as part of step . This is part of a general pattern of effective software development: we don’t write large code all at once, we write it as a series of small changes, each individually testable.
It is possible to read all the steps, then try to do them all at once. Technically this could result in writing less code overall. Practically it would actually result in writing buggy code and then having no idea which part is incorrect. If you possibly can, your should always code in small testable steps, even if that means some of those steps involve a bit of extra work that you’ll undo later to get them into a testable configuration.
Making your own tests
The provided test cases in the workloads
folder are intended to test both the functionality and performance of your code. Being useful cases for debugging your code was not a design goal of these workloads.
If your code crashes, you might want to write your own test file with its own main.c
and so on that calls your functions. This can help you not get distracted by all the overhead of the memory tracking system in testharness.c
.
If your code gives an allocator correctness error (such as allocated illegal address
or allocated already-used memory
) the testharness might be useful, but you should probably minimize the test case before debugging. This means copying the test into mytest.c
and then removing as much of it as you can without making the test pass. In other words,
mytest.c
(when compiled to mytest.so
and run using the test harness) fail.Debug one workloadrun option, not
Debug all workloadsor
Run all tests.
Add metadata to each allocation. The usual way to do this is when asked for an n
-byte block, allocate an n + sizeof(metadata)
-byte block1
; in the first sizeof(metadata)
bytes store the metadata and return a pointer to the first byte after that.
If given a non-NULL
ptr
, look sizeof(metadata)
before it to read the metadata for that block.
You’ll want to change you metadata as you add more features, but initially store the size of the block in it.
This will make your code take more time and need more space (except perhaps making size-increasing reallocs faster), but it will also unlock later changes.
With size metadata it becomes possible to display your entire memory: you know where the first block is, and from its size can find the next, and so on. You might find this useful in some debugging situations, and for doing step 5 below.
The functions you write take void *
arguments and return void *
values. But they will be using some kind of metadata *
pointers internally, computed from those void *
by some kind of offset.
C and C++ both define ptr + offset
to be synonymous with &(ptr[offset])
: that is, both find a pointer that points offset
values later in the hypothetical array pointed to by ptr
. Thus, the byte offset depends on the pointer type:
int *a = 0x10000
, a+1 == 0x10004
char *a = 0x10000
, a+1 == 0x10001
int **a = 0x10000
, a+1 == 0x10008
In pointer arithmetic, void *
acts like a pointer to single bytes. Thus, for any type A
, ((A *)ptr)-1 == ((void *)ptr) - sizeof(A)
.
Don’t copy data or change the pointer when decreasing the size of a block. Also don’t change its size: simply ignore shrink requests.
This will improve your code’s runtime and space.
This will cause the realloc_jitter
test to use under 5,000 bytes instead of over 50,000.
Don’t copy data or change the pointer when increasing the size of the block at the end of the allocated memory.
This will improve your code’s runtime and space.
This will cause the small_resize
test to use under 1,000 bytes instead of over 10,000.
When deallocating (with myfree
) or decreasing (with merealloc
) the size of the block at the end of the allocated memory, also return its memory to the unused memory set.
This will improve your code’s space.
This will cause the backstep
test to use under 200,000 bytes instead of over 1,000,000 and mergesort_like
to use under 200,000 bytes instead of over 700,000.
When deallocating a block that has another block after it, mark it in some way (likely in the metadata) as unused2 . If a later allocation wants a block smaller than some current unused block, return the unused block instead of allocating a new block.
If you know block sizes and where the first block is, you can use that to walk the entire set of blocks looking for a large-enough unused block.
This will improve your code’s space but harm it’s runtime.
This will cause the chaos_reuse
test to use under 1,000 bytes instead of over 20,000 and chaos_reuse_2
use under 10,000 bytes instead of over 100,000. However, it will slow down most tests, especially linked_list
; on my computers it slows that test down to take several seconds, but I’ve had reports that on some machines it slows them to several minutes instead.
You might want to skip long-running tests. There are at least three ways to do that.
Test one workloadrun action or
./tester workloads/chaos_reuse.so
command-line invocation.mytest
functions other than return 0;
. To re-enable them, uncomment those lines.workloads/
folder. To re-enable them, move them back into that folder.Avoid scanning through used blocks for unused blocks. Large linked data structures may have millions of small used blocks; scanning through them looking for unused blocks can be time-prohibitive.
You can avoid every looking at a used block when hunting for unused blocks by creating some kind of linked data structure (linked list, tree, etc) of unused blocks with nodes stored inside the unused blocks (or their metadata) and a root/head/tail pointer as a global variable, then walking that structure on each allocation.
When creating this structure, it’s good practice to try to serve allocations with the most-recently-freed block first. This will tend to improve cache performance and is optimized for the common case where a free of an -byte block is followed by an allocation of another -byte block.
Remember to reset the structure when allocator_reset
is called.
This will improve your code’s runtime, but may worsen its space slightly if it makes you metadata larger.
This should result in runtimes similar to step 4 and space utilization comparable to step 5.
When marking a block as unused, see if the block before and/or after it in memory are also unused; if so, merge them into one large unused block instead of several smaller ones. Do this before the end-of-memory deallocation optimization (step 4)
Make sure this doesn’t mess up the links between unused blocks added in step 6. Also note that adjacent-in-memory blocks are rarely adjacent in the list of free blocks.
This is mostly a pair with splitting blocks, and in most cases the two will only result in space savings if they are both implemented correctly. However, they are sufficiently distinct operations that they are usefully implemented and tested independently.
This will cause small_resize_2
to use under 3,000 bytes instead of over 20,000; but it will several previous tests to lose their previous gains (for example, chaos_reuse
and chaos_reuse_2
will use 10× more memory) and it may stop passing some of the earlier steps it used to pass. Without the next step, a mixed win at best.
When allocating into a large unused block, split that block in two, one used for the new allocation and the other one left unused. Do the same when shrinking a block during reallocation.
Make sure you include space for the metadata needed by both blocks when splitting; if there’s not room to split, don’t.
Make sure this doesn’t mess up the links between unused blocks. Also note that adjacent-in-memory blocks are rarely adjacent in the list of free blocks.
This will improve your code’s space.
This will keep the gains to small_resize
and small_resize_2
from step 7 and return chaos_reuse
and chaos_reuse_2
to under 1,000 and under 10,000 byes, respectively. It will also cause backstep
to use under 200,000 bytes instead of over 1,000,000 and backstep_2
to use under 1,500,000 bytes instead of over that.
Avoid looking at small unused blocks when you need a large one. Linked data structures with many adds and removes might result in millions of small unused blocks; when asking for a new large block it can take a very long time to scan past the small ones.
The usual fix for this is to have the linked data structure of unused blocks sort or group them by size. Note that this can make splitting and merging blocks much more complicated.
Because blocks are arranged by size, you have to pick what size to use for each allocation. Intuitively, picking the smallest big-enough free block makes sense, but can result in many unused slivers if not carefully tuned with block splitting. Always picking (and splitting) the largest free block is much easier to get to work well.
This will improve your code’s runtime for some workloads but harm it for others. It will also adjust space used, sometimes up and sometimes down.
To get credit for step 9, you need to still meet all of the previous step’s space and time thresholds and also have fragmented_growing
run in under 20ms and use less than 5MB.
You don’t have to give the exact amount of memory requested: you can give more instead. If you do give more than subsequent requests to increase size might not require any work.
Giving extra memory can happen naturally when an unused block is larger than needed but too small to split. However, there are many other places you could choose to allocate more than requested, which can make different workflows use more or less memory overall. For example,
This could either improve or harm runtime or space, depending on how it is implemented and what workloads are added.
Memory systems are built such that accessing a 4-byte value is faster if the address is a multiple of 4 than if it is not. Likewise, 2-byte values are faster if their address is a multiple of 2 and 8-byte values if their address is a multiple of 8; whether this also applies to 16- and 32-byte values varies. Giving addresses that have these multiple-of-a-power-of-two properties is called memory alignment.
Other-sized values are generally faster if they are aligned like the next higher power of 2; so an 18-byte value should be aligned like a 32-byte value. Arrays are generally fast when their elements are aligned; there’s no need to align the entire array.
Many allocators align memory by rounding up allocation sizes a little, resulting in faster code using the memory at the expense of allocating a bit more memory. This won’t have much impact on the allocator’s own performance, but will improve the performance of code using the allocator.
Implementing up through step 6 ([free lists]) is worth 100%. Partial credit is awarded for doing a subset of the above steps. More partial credit is earned for going in order: steps 3 is worth more partial credit if steps 1 and 2 are also done than if they are not.
No points are earned if the allocator you submit is nonfunctional (i.e. fails to return non-overlapping large-enough memory with each mymalloc
call). We strongly recommend keeping a copy of your last points-earning implementation so that you can revert to it if you get your code into a broken state.
Steps 7 + 8 (merging and splitting) are +5% extra credit, but only if 100% earned already. Step 9 (size-aware free tracking) is another +5% extra credit, but only if 105% earned already.
Steps 10 and 11 does not award points, but may help you win bragging rights for the best implementation. If you’d like to enter a space-and-time competition, do the following:
Earn at least 100%
On the first line of allocator.c
, add a comment // SCREENNAME: anonymous
replacing anonymous
with a screen name of your choice.
Keep this clean, inoffensive, and relatively short. Do not use the screen name to impersonate another member of the class.
Put a test case in mytest.c
that you believe your allocator will do better on than most other allocators. This needn’t be original: you can copy existing code (with a->
in front of all the allocations) if you wish, but if you do so include an acknowledgment of its source in a comment in the file.
Make sure your test case terminates! Currently the competition system doesn’t time out individual test cases, so if yours has an infinite loop it could break the entire competition system.
Periodically, we will run all contestant allocators against all provided and contestant tests and post the space and time results here.