Algorithms

An algorithm is a sequence of unambiguously-defined steps to accomplish some task. The word comes from a transliteration of al-Khwarizmi (الخوارزميّ‎), part of Muhammad ibn Musa al-Khwarizmi’s name. His 9th-century books on place-value arithmetic and algebra brought these ideas from the Indian subcontinent to Europe.

A key part of al-Khwarizmi’s books success in Europe is the fact that algorithms for the same task may be significantly more or less feasible depending on how the data used in the task is represented. The user of place-value numbers instead of tallies or Roman numerals made the algorithms he described much simpler and more versatile than the algorithms previously used in Europe.

Who invented what al-Khwarizmi described?

Al-Khwarizmi’s books were assumed by the Europeans who read them to describe his own work, which is why algorithm is named after him, algebra named after his book, and place-value digits are called Arabic numerals.

But the digits long predate al-Khwarizmi and appear to have originated in the Indian subcontinent, not the Middle East. Whether the rest of the contents of his books also predate him does not appear to be a matter of consensus: I’ve seen claims that they were his invention; that they were invented by other Middle-Eastern scholars; that they were invented in India and reached Arabia through travelling merchants; that they were independently rediscovered multiple times; and various mixes of these origins for different parts of his work.

What is clear is that al-Khwarizmi moved European mathematics forward immensely, at a minimum by good communication whether it was also about his own inventions or those of others. It is also clear that most of what we mean by algorithm today was first invented in the 20th century when digital computers enabled algorithms to be separated from the humans who followed them, opening up a wide range of new theoretical framing and algorithmic applications.

Using Roman numerals, to compute LXXVIII + LIX we

Using Indian numerals, to compute 78 + 59 we

Both take roughly the same amount of work, but the latter algorithm is much simpler because it uses repetition instead of many distinct steps.

1 Parts of algorithms

An algorithm includes unambiguously defined structure of data, input and output, and operations to complete the task.

1.1 Structure of data

Data is information represented in some well-defined way. The way that data is organized to represent that information is a key part of how algorithms are defined.

The structure of data is important enough that computer science explores it in many different ways, for example by distinguishing between primitive and composite data, between data structures and abstract data types, between file formats and in-memory formats and wire formats, and so on.

Consider a map-based navigation algorithm. How should it store the map? Among many other options, it could be

  • A grid with with 5×5 meter cells, each cell being marked either road or not road.

  • A list of road segments, each storing a starting and ending latitude and longitude, a speed limit, and whether it is one way, the other way, or two-way.

  • A structure for each entire road, each storing the shape of the road and a list of all the other roads it intersects and the distance along the road of each such intersection.

  • A list of intersections, each with a latitude and longitude and a list of adjacent intersections it has roads to reach.

… and so on for many other options.

Algorithms designed to work on the grid type of map generally will not work if we give them a list of intersections instead, and so on for other the data designs. The structure of the data informs the nature of the algorithm.

Some algorithms can be adapted to several different data structures by using abstraction (in particular the type of abstraction called an abstract data type) to hide the data structure details.

We might abstract away the details of the map structure by defining a set of generic operations on maps; for example, we could define an operation called one step neighbor of which, given a location on the map, gives back all the other locations that can be reached in one step.

On a grid, a location might be a grid cell and a step might be moving to an adjacent cell. On a list of roads, a location might be a road and a distance along it and a step might be driving to any other road that intersects with it. The one step neighbor of algorithms for those two will differ, but we could then define an algorithm that uses that as an abstraction to work with both types of maps.

1.2 Input and output

Each algorithm can solve an entire family of computations, depending on the input it is given.

Algorithms often compute some values we don’t care about (like the set of carry digits in addition) as well as some we do (like the sum in addition). The values we do care about are called the algorithm’s output. If two different algorithms give the same output as one another for any given input, we say they are algorithms that solve the same problem even if they compute different non-output values along the way.

It is common to distinguish between two types of input data. Some data (often called a resource, database, backing data, or asset) is large and rarely updated, such as the street map used by a navigation system. Other data (often called user input) is smaller and changes each time the algorithm is used, such as the starting location and destination used by a navigation system.

A navigation algorithm likely includes as inputs:

  • User input: current location, destination
  • Data resource: map

It might also include other inputs like

  • Current traffic
  • User preferences
  • User’s driving history
  • Other user’s locations, speeds, and destinations

1.3 Defined operations

The core part of an algorithm is the defined sequence of steps it uses to compute a result.

In the simplest case, these operations are a fixed set of steps. But most interesting algorithms use repetition and functions to repeat some steps, often repeating them different number of times for different inputs.

A number is even if its last digit is even. This is a fixed set of steps with no repetition.

Long addition repeats the step add two input digits and a carry digit once for each pair of digits in its input.

There are often multiple algorithms available for solving the same problem. While there are many reasons to pick one algorithm over another, one of the most common selection criteria is how quickly the algorithm can solve the sepecific problems we expect to give it.

There are many algorithms for finding a path through a map, so many that Wikipedia breaks them into multiple lists and tables.

One example is the A* algorithm (pronounced A star) which works as follows:

  • Keep a collection of travel paths that start at the current location
  • Initially, put just a trivial travel path in the collection which starts and stops at the current location
  • Repeat until a path reaches the destination:
    • Remove from the the collection the path that minimizes (length of path)+(distance from end of path to destination)(\text{length of path}) + (\text{distance from end of path to destination})
    • Find all navigation steps you could take from the end of that path; for each one
      • Make a new path that adds that step to the end of the path and add it to the collection

2 Algorithms in your life

Once upon a time, algorithms were just tools, used to solve specific problems but not otherwise impacting day-to-day life. But with the wide-spread adoption of the Internet that has changed and algorithms are impacting (among other things) what parts of the world you see.

2.2 Recommender systems

Recommender systems, also called recommendation systems, try to guess which which things you’re most likely to watch/read/listen to next.

The recommendation task is:

User input
Your watch history
Resource data
Everyone else’s watch history
Output
A few recommended things to watch

Recommender systems have many variations in their algorithms. They generally prioritize some mix of their best guess of (a) recommendations you’ll accept, (b) recommendations that bring them ad revenue, and (c) recommendations that are in the middle, not the end, of other users’ watch histories. This last point is often described as making the platform addictive and can lead to downgrading content that would lead users to a sense of completeness and conclusion and a satisfying end to their experience on the platform.

3 Picking algorithms

Algorithms differ in many ways, and there is no single set of criterion for picking one. Several of common considerations are given here:

Simplicity
Simpler algorithms are cheaper to implement, easier to adjust or replace later, and are often more reliable and harder to break than more complicated algorithms.
Fast
Faster algorithms create less delay for users and can be run more often on a given piece of hardware.
Energy efficient

Energy efficient algorithms use less battery life and reduce energy usage.

Often, faster algorithms are also more energy efficient because the longer a computer has to work on something the more energy it uses doing it. But some operations and hardware components use more energy per second than others, so algorithms that avoid those operations can be more energy efficient than their speed suggests.

Small

Some algorithms use enough memory that running several such algorithms at the same time, or running them on small embedded devices, is challenging. Algorithms that use less space are valuable in these contexts.

There is also a complicated relationship between space used and time used, where reducing space can either increase or reduce time dependin gon how it is reduced.

Accuracy

Some algorithms are precise: they find the right answer.

Many algorithms are imprecise, for one of two reasons. Some problems don’t have a well-defined right answer (for example, recommender systems). Other problems do have a right answer, but it is hard to compute so a heuristic algorithm is used which is easier to compute but only finds an approximate or mostly-right answer.

Suppose you wish to visit every city in the USA and spend the minimum amount of time on the road doing so. There is an optimal route, one that uses less time than any other, but finding that route is very hard. We can get pretty good routes by using heuristics such as going in a zigzag pattern across the country or always going to the closest unvisited city next.

Self-improvement

Most algorithms are defined once and operate consistently thereafter, but some adapt and (hopefully) improve over time.

The most common way to have algorithms self-improve is to have them accumulate additional resource data, typically processing it into an efficiently-usable form as it arrives. A less common way is to have a meta-algorithm that changes details of the main algorithm, often by trying several variants to see which ones work best. The line between these two is not crisp, and some people might disagree about which algorithms use which technique.

Testability

It is usually infeasible and sometimes impossible to prove that an implementation of an algorithm is correct, with no errors or security vulnerabilities. Thus, it is common to try to estimate correctness using tests.

A test case is an input and the output it should produce. Algorithm implementations are tested by trying many test cases and verifying that they produce the correct output for each test.

Some algorithms readily lend themselves to creating a wide range of useful tests while others are much more challenging to test.

4 Algorithms and scale

Computer science includes significant discussion of how different algorithms scale, meaning how their runtime (and sometimes other resources) increase as the size of the problem they are asked to solve increases. Ignoring vast amounts of detail, there are a few broad groups that are worth understanding.

A problem is called intractable if the best known algorithm for solving it requires exponential time, meaning that adding a small amount of additional input increases runtime be a multiplicative factor.

Breaking passwords by guess-and-check is an exponential-time algorithm. Assuming that passwords are made out of only English lower-case letters, the number we need to guess is:

Password length Number to check
1 letter 26
2 letters 676
3 letters 17576
4 letters 456976
5 letters 11881376
nn letters 26n26^n

Some algorithms require time proportional to some power of the input size. A simple example is an algorithm that considers all pairs of something: all pairs of students and their tests, all pairs of potential roommates, etc; there are roughly n2n^2 ways to pair up nn things, meaning checking them all will take that long. Algorithms like these slow down noticeably above some size; pairing 100 people might take a fraction of a second while 1,000 might have a noticeable delay and 10,000 might take a few hours. More scalable than intractable algorithms, certainly, but still noticeably constrained by scale.

Often, people expect algorithms to scale linearly with input. Double the input size and we expect to double the time needed to complete the task. One of the most common algorithms used as part of some other algorithm is putting a list of values in sorted order; this scales just a tiny bit worse than linearly (a list of nn items can be sorted in nlognn \log n steps) but it feels close enough to match people’s intuition of scale.

When inputs get very large, we need algorithms that take less than linear time. Because looking at each input takes time, this requires finding the result without even looking at most of the input at all. A classic algorithm with sublinear time is finding a value in a sorted list (like a print telephone book or dictionary): we can find a value in a list of 1 million entries by checking only 30 of them if we know that the list is in sorted order.

Several ways of scaling with larger input have names that are common enough to be worth memorizing:

Name Time for input size nn Comments Example
Constant O(1)O(1) Ignores most input entirely. Pick a random value from a list.
Logarithmic O(logn)O(\log n) Selectively views a few inputs. Find a value in a sorted list.
Linear O(n)O(n) Check each input once. Count how many values in a list have a given property.
Quadratic O(n2)O(n^2) Feasible for small problems, slow for larger ones. Count how many pairs of people in a room root for rival teams.
Exponential O(2n)O(2^n) Intractable, basically unusable. Try every possible password.

The O(...)O(...) above is called Big-O and is a mathematically precise way of saying roughly. Big-O notation is also called asymptotic notation because its mathematical definition is related to the mathematical concept of asymptotes.

5 Families of algorithms

Algorithms are often grouped based on the data structures they operate on.

5.1 List algorithms

Lists are very common in many contexts. Computing has a few common list data structures, meaning ways of representing lists using bits. Two of the most common are:

The most common list algorithms are

5.2 Graph algorithms

In computing, the word graph does not refer to a chart or other visual representation of data; instead, it means a set of nodes connected by a set of edges, where each edge connects exactly two nodes and any number of edges may connect to a given node.

A B C D E 1     2       3    4      5    6

An illustration of a graph, using circles to represent nodes and lines to represent edges.

The graph has five nodes: A, B, C, D, and E.

The graph has six edges: edge 1 connects A-B, edge 2 connects B-D, edge 3 connects B-E, edge 4 connects C-D, edge 5 connects C-E, and edge 6 connects D-E.

By adding information to the nodes and edges, they can be used to represent a surprising variety of problems, both spatial problems like navigating around a map and designing sewage systems and non-spatial problems like setting up people on dates and winning turn-based games.

Graph problems are well studied, and include both tractable and intractable problems. If you hear a computer scientist say this is a graph problem expect the next statement to be either and we can solve it easily or and it’s basically impossible.

Some of the most common graph algorithms are

Note that often even very similar-sounding graph problems (like finding the shortest path and finding the longest path) have very different complexity (one being easy, the other being intractable). This can seem frustrating, with seemingly minor teaks to a problem statement resulting in huge changes to the algorithm feasibility.

5.3 Matrix algorithms

Mathematics has many topics and subfields, one of which has come to dominate computation: matrices and vectors. These are studied in the branch of mathematics called linear algebra and in the math and computing hybrid field of numerical methods.

A vector is a list of numbers and a matrix is a grid of numbers. They are combined with simple multiplication and addition operations, typically by lining up numbers in rows with numbers in columns. Because of the simple, regular structure of these computations they can be easily broken up into pieces with each piece worked on by a different piece of hardware, allowing very large matrix operations to be computed very quickly by using many copies of simple hardware.

Matrices are used as the core of most algorithms that simulate something complex or deal with a large number of similar inputs at once. Weather prediction, stock market analysis, robotic control systems, generative AI, and most statistical analysis and computer-aided science are among the many computations for which the most successful algorithms use matrix operations to do most of the computational work.