1 Data structures

A maze is a type of undirected graph.

The classic definition of an undirected graph is two sets: a set of vertices $V$ and a set of edges $E$ where each edge is a set of two vertices.

Because our mazes will be grids, the natural representation of a vertex is an $(x,y)$ pair.

Python supplies sets with two types: set can be mutated (good for adding and removing edges) but cannot be a member of another set; and frozenset cannot be mutated but can be a member of another set.

This suggests the following data structure:

A vertex is a pair of integers, like (0,0)
An edge is a frozenset of a pair of vertices, like frozenset([(0,0), (1,0)])
A graph is a set of vertices and a set of edges.

To make a 7×7 maze, the set of vertices would range from (0,0) to (6,6) and the edges would be a subset of all possible horizontal edges frozenset([(x,y),(x+1,y)]) and all possible vertical edges frozenset([(x,y),(x,y+1)]). Which subset you pick changes which maze you have.

Using dicts for edges.

The set-of-edges approach makes finding the edges that include a given vertex slightly tedious: you have to generate the four possibilities and check which ones are in the set.

An alternative is to store bidirectional edges in a dict. Use vertices as keys and the set of vertices the key is connected to as values. This makes finding the edges of a path easy, but makes randomly picking an edges more complicated and also necessitates updating two entries for every edit to the edges to keep them bidirectional.

On the whole, we think the set-of-edges approach is easier, but only marginally so; feel free to use dicts if you prefer.

Using a 2D array for the entire maze.

You can represent a maze as a 2D array. In a memory-oriented language like C this makes sense, requiring less data structure overhead than sets. In an object-oriented language like Python arrays make less sense than sets, but they can be implemented if desired.

You’ll need a subset of the following four special edges to be exits to your maze:

left:	$\{(-1,3),(0,3)\}$
down:	$\{(3,6),(3,7)\}$
right:	$\{(6,3),(7,3)\}$
up:	$\{(3,-1),(3,0)\}$

Those exit edges should be the only edges that connect to a vertex not $V$ . We recommend adding them as a last step after your maze is fully built to avoid your maze generation algorithm being confused by an edge to nowhere.

2 Serialization

We need to send mazes between computers, which means we need a standard way of representing them. We’ll use the following format:

A maze is represented by a list of strings, one string per row of vertices in the maze, ordered top to bottom.
A row is represented by a string of hexadecimal digits, one digit per vertex in the maze, ordered left to right.

A cell is represented by a hexadecimal digit (i.e. a number between 0 and 15 inclusive) where each bit represents whether that cell as a wall (1) or not (0) in a given direction. We also use these same hexadecimal digits to represent which borders of your maze should have exits.

We map bits to directions as follows:

Direction	Bit	Exit vertex	Exit edge
Left	$2^0=1$	$(0,3)$	$\{(0,3),(-1,3)\}$
Down	$2^1=2$	$(3,6)$	$\{(3,6),(3,7)\}$
Right	$2^2=4$	$(6,3)$	$\{(6,3),(7,3)\}$
Up	$2^3=8$	$(3,0)$	$\{(3,0),(3,-1)\}$

Cell 0 has no walls.

Cell 2 has one wall, on its bottom side.

Cell 5 has walls to its left and right because 0x5 = 1 + 4.

Cell F has walls in all directions, because 0xF = 1 + 2 + 4 + 8.

Maze 2 has exits on all sides except down. That means it has three exit edges ( $\{(-1,3),(0,3)\}$ , $\{(6,3),(7,3)\}$ , and $\{(3,-1),(3,0)\}$ ) and that the maze is sufficiently navigable that a path exists between the three exit vertices: a path between $(0,3)$ and $(6,3)$ and a path between $(6,3)$ and $(3,0)$ and a path between $(3,0)$ and $(0,3)$ . Note that due to the transitivity of paths, if two of those paths exist the third is guaranteed to exist as well.

Maze F has no exists and needn’t be navigable at all.

The maze

["988088c","1000004","1000004","0000000","1000004","1000004","3220226"]

is an empty 7×7 room with exits on all four borders.

The maze

["98a0a8c","1494b04","1430e14","0280820","1e575d5","1e1c575","3a202a6"]

has walls that spell out CS 340 with exits on all four borders.

The CS 340 maze

3 Ensuring navigability

Two vertices are connected if there’s a path of edges between them. The existence of a path can be found using depth-first search¹¹ Depth-first search typically uses a stack of vertices to visit and a set of previously-visited vertices. set is a perfect match for the previously-visited vertices; list works well for the stack (though it calls it’s push operation append instead of push).. We require each pair of exit vertices are connected.

A maze is fully navigable if each vertex is connected to each other vertex, or in other words if the entire maze consists of a single connected component. Detecting full navigability can be done by a single depth-first search starting from any vertex: if all vertices are visited during the search, the graph is a single connected component and the maze is fully navigable.

4 Maze Generation

4.1 Simple heuristic method

Start with a maze with all edges included
Repeat some number of times
1. Pick a random edge
2. Remove the chosen edge from the maze
3. Check for sufficient navigability; if it’s not present, re-add the chosen edge

This will create ugly mazes with open areas and/or inaccessible areas, but it is easy to implement and sufficient for our needs.

4.2 Non-uniform spanning tree

Any spanning tree of the graph with all edges included is a fully navigable maze. You likely learned some minimal spanning tree algorithm in your data structures course which could be applied to maze generation.

Note that minimal spanning tree algorithms are biased toward mazes with short solutions and many dead ends.²² This bias arises because the first acyclic path added to the graph connecting two entry points wins. Minimal spanning tree algorithms add one edge at a time, meaning longer paths require more steps to create and are thus less likely to appear before shorter paths.

4.3 Uniform spanning tree

Wilson’s algorithm provides a uniform distribution over all possible spanning trees. It uses loop-erased random walks as its primary computational element.

The simplest implementation strategy for Wilson’s algorithm that I know goes as follows:

Add a label to each vertex. Initially set all labels to unused.
Add a label to each edge. Initially set all labels to unused.
Pick a vertex and switch it’s label to used.
Repeat the following until all vertices are labeled used:
1. Pick a vertex that is not labeled used; call this vertex both $s$ and $c$ .
2. While $c$ $c$ is not labeled used:
  1. Pick a random edge connected to $c$ .
  2. Label $c$ with the edge selected.
  3. Let $c$ be the other vertex on that edge.
3. While $s$ $s$ is not labeled used:
  1. Let $e$ be the edge that is $s$ ’s label.
  2. Label both $s$ and $e$ used.
  3. Let $s$ be the other vertex on $e$ .
Remove any edge not labeled used.

In the two pick a vertex steps (3. and 4.a.), which vertex you pick won’t change the final outcome: you can pick them at random, pick them in a fixed predetermined order, or anything else you wish.

How Wilson’s Algorithm works

Step 4.b. executes a random walk of the graph: it starts somewhere and moves in a random direction over and over. 4.b.ii. marks the vertices visited during the walk so that the random walk can be repeated later.

Random walks tend to cross over themselves often, creating loops in the walk that would prevent them from being part of a spanning tree. However, those loops are erased in this code by subsequent executions of 4.b.ii.: if we visit the same vertex multiple times, only the last visit’s path will be recorded in the vertex’s label.

Because 4.b. erased any loops, 4.c. has a random walk with no loops that ends at a vertex in the maze so far. Adding that walk to the maze extends the maze into a larger spanning tree. Because each addition is a truly random walk and may be any length, there’s no bias introduced by adding super-short single-edge paths early on and then later additions being limited by those early choices.

4.4 Mazes with loops

Many simple maze solving algorithms rely on the maze being a tree, not an arbitrary graph, and fail if the maze contains loops. Adding some loops to a maze can simultaneously make it harder to solve and make the shorted path through the maze shorter.

The easiest way to add loops to a maze is to generate a maze without loops first and then add a few edges. Of particular interest is a maze with no dead ends, which can be created by adding another edge to any vertex that only has one edge.

4.5 Hand-crafted mazes

You don’t need to make your mazes entirely at random. It’s perfectly acceptable to draw out a set of walls you find interesting in some way and use that.

It’s also possible to craft small wall patterns and combine those programmatically into a full maze; for example, a 7×7 maze is large enough to represent 6 decimal digits as walls, and most letters can also be created.