Processor Hardware
A light-weight high-level summary

The purpose of this page is to skim through key ideas in digital hardware at a level of detail that will hopefully communicate the idea nothing is magic, everything is designed by regular people.

1 Electricity primer

Electrical current is the flow of electrons across a conductor. More electrons per second = more current.

Electrical voltage is the force gradient that causes electrons to want to move. It is similar to pressure, but for electrons instead of fluids.

Electrical resistance impedes current. Because resistance exists1 except in superconductors, but they are not (yet) common in computers, current requires some voltage, with more voltage needed if there is more resistance.

Electrical power is current × voltage.

Resistance converts power into heat. Heat in turn increases temperature. Computer components tend to fail if the temperature rises too high, so heat sinks (devices to move heat off of the computer) are key to running computers quickly.

2 Transistors

Transistors act like electrical valves. Each is connected to three wires Applying voltage to one wire (which is a dead-end wire and cannot accept current) changes whether the other two wires are connected (allowing current to flow) or not by changing the resistance of the connection between those two wires.

Transistors are designed to operate at a specific voltage, the voltage needed to reliably change connectivity. Lower voltages don’t operate and higher voltages can damage or break the transistor. We commonly call voltage high enough to engage a transistor high, 1, or voltage and voltage too low to engage a transistor low, or 0, or ground.

There are two types of transistors. Some allow current unless there is voltage and others allow current only if there is voltage.

There are multiple drawings used for transistors. We’ll use a T-shaped wire for the voltage control wire almost touching an offset jig connecting the current wires, with a circle at the join of the T for transistors that allow current with no voltage and no circle for transistors that allow current with high voltage.

voltage here  prevents current here
Drawing of transistor where voltage prevents current.
voltage here  enables current here
Drawing of transistor where voltage enables current.

3 Gates

Small numbers of transistors are commonly arranged between a voltage source and ground to have specific properties. These generally have a few input wires that need to be connected to voltage sources and a few output wires that receive voltage. We call these common arrangements gates.

C voltage source ground A   B   A   B  
A NAND gate, one of the most common gates in computer hardware. Output C is 0 if and only if both input A and input B are 1.

With a little design work, we can create gates from transistors that mimic any 1-, 2-, or 3-input truth table, including and, or, xor, not, implication, bi-implication, and so on. Chaining these gates together, we can create any Boolean expression with any number of arguments.

4 Building with Gates

By attaching gates’ outputs to the inputs of other gates, we can build more complicated operations. Such connected gates are commonly called logic.

4.1 Arithmetic

Binary arithmetic is just a complicated sequence of Boolean expressions. To see this, let’s consider building the expression for a binary adder.

Suppose we have two binary numbers we wish to add, using only basic logical operations. Each number is represented by a sequence of bits; x0 is the 1s place of number x, x1 is the 2s place, x2 is the 4s place, x3 is the 8s place, and so on; similarly with y. We want to arrange a set of individual Boolean operations to compute all of the bits of z, where z = x + y.

We’ll proceed the same way we would by hand: with the least-significant digit first and a carry bit if the sum in one place is larger than 1. To be sure we catch all cases, let’s enumerate all four possible combinations of x0 and y0 and what the z0 and carry should be in each case.

x0 y0 z0 carry1
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

Notice that the z0 column looks just like the XOR operation; and that the carry1 column looks just like the AND operation. Thus we can configure the following:

z0 = x0 ^ y0
c1 = x0 & y0

Now for z1. This is the sum of x1, y1, and the carry we just computed. Again for completeness, let’s enumerate all 8 combinations possible for these three inputs:

c1 x1 y1 z1 carry2
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

The z1 column is the parity of c1, x1, and y1, which can be computed by a pair of xors:

z1 = c1 ^ x1 ^ y1

The carry2 is 1 if at least two of c1, x1, and y1 are 1. There are various ways to write that expression; one is

c2 = (x1 & y1) | (c1 & (x1 ^ y1))

Everything we did for z1 and c2 also apply for all later zs and cs:

z2 = c2 ^ x2 ^ y2
c3 = (x2 & y2) | (c2 & (x2 ^ y2))
z3 = c3 ^ x3 ^ y3
c4 = (x3 & y3) | (c3 & (x3 ^ y3))
z4 = c4 ^ x4 ^ y4
c5 = (x4 & y4) | (c4 & (x4 ^ y4))
...

Thus, we can wire together a bunch of and, or, and xor gates to create an adder.

Ripple-carry adder circuit

There’s nothing special about an adder; with enough gates we can make anything you could write as a single expression of arithmetic and logic operations.

4.2 Multiplexer

A multiplexer works like an array indexing operation. We have nn data inputs, 11 data output, and 11 selection input that is a number between 00 and n1n-1.

The simplest multiplexer has 2 data inputs and a 1-bit selection input. It could be implemented as

out = (selection * in1) + ((1-selection) * in0)

Real implementations use fewer gates than multiplication, but the net result is the same.

A 2n2n-input multiplexer can be made out of two nn-input multiplexers with their output multiplexed by a 2-input multiplexer. This is one of several reasons why powers of two are common in hardware: circuitry that can pick one of 5 options is no simpler than circuitry that can pick one of 8.

The 2-input multiplexer is also an operator in most programming languages. It has 3 inputs (2 data, one selection), meaning it’s a trinary operator; and as it’s the only trinary operator in most languages it is often called simply the trinary operator. In most lanuguages it is written as follows:

out = selection ? in1 : in2;

though in Python it is instead

out = in1 if selection else in2

It’s behavior is similar to an if-statement, except it is an expression not a statement.

4.3 Register

Storing information is an important part of building a processor. The fastest and most important storage technique is called a register and is made out of several gates linked up in specific ways with their outputs feeding into each others’ inputs. That feedback wiring creates little loops that store information; typically one of those loops is also used as an output so a register is always outputting the value it is storing. Registers typically have two inputs: a data input which supplies a new value to store and a control input which decides if the data input should be stored or ignored.

A common register hardware design is the D flip-flop2 Understanding the operation of D flip-flops is not important for this course, but falstad.com has an interactive demo you can try if you are interested., also called the positive-edge-triggered D-style flip-flop, which can be made out of six NAND gates. This register uses a clock as its control input which allows a single instant in time to be used as the moment when the register stores its input.

Registers are the only part of common computers where information flow between gates is cyclical. As such they are often excluded from the generic logic term.

4.4 Memory

The operation of memory is I give it an address and it gives back the data stored at that address. We could implement that using a register for each value in memory and a multiplexer to pick out the value at the given address.

That design doesn’t scale well to the billions-of-bytes memories we use; registers take up too much space and need too much power. Various cheaper but slower designs are used instead that have the same behavior at lower cost.

5 Processor design

5.1 Conceptual design

Processors store instructions (think line of code) in memory. Instructions generally are read and executed one after another, which we can implement by having a special register (called the program counter or PC) which stores the address of the next instruction to execute. We send that to memory to fetch the next instruction instruction = memory[PC]. We wire the output of the PC to an adder and its input to implement PC = PC + 1 and move through all the instructions.

Common instructions perform operations on the hardware equivalent of local variables, called program registers. Like memory, these are an array of stored values; unlike memory, they are few in number and made out of actual registers so they are very fast to access. If you had x = y + z if your source code, the corresponding instruction might be register[0] = register[1] + register[2]: in other words, we used numbers instead of names for variables. The specific operation is done by picking one of several operation circuits using a multiplexer; thus the operation + is also encoded as an integer. This means the entire instruction is just four integers: 3 registers and one operation. Other operations like memory access or unary operations might have a different set of components, but we cal always turn them into simple numbers.

Code also contains control constructs: if statements, loops, function calls, and so on. All of these change what code is executed next, meaning they all set PC = something. If they do so conditionally, that just adds a multiplexer, PC = condition ? address1 : address0.

5.2 Modern design

While fully-functional computers can be built on the simple design outlined above, modern computers have many more complexities that help them run more quickly and with less power than a simple design would. A few example complexities are noted below.

  1. Fancy instruction encoding.

    The set of instructions a computer can handle tends to change with every new release of the computer. But we don’t want to have to change all our compiled code every time. Thus, instruction encoding tend to have deprecated instructions that are converted to a different set of internal instructions inside the processor and late-add instructions that violate the pattern established by the early instructions, making the instruction encoding confusing and not necessarily aligned with what the processor actually does.

  2. Pipelining.

    Pipelining is the processor version of an assembly line: single instructions are broken into multiple steps and passed from one part of the processor to another, each doing just one step.

    Division requires non-trivial work for each bit of the answer, so instead of making one big circuit for all of it we make a first-bit circuit that hands is work off to the second-bit circuit and so on. 64 steps later we’ve got the result. But when that result arrives we might have 63 other divisions in the pipeline in various stages of completion.

    As of 2024, it is common to have every instruction go through a 5–15 stage pipeline and a few difficult ones like division add a few dozen more steps too. This means the normal case is that the processor is working on a dozen instructions at the same time.

  3. Speculation

    Pipelining has a problem that assembly lines do no: interdependence of instructions. If I work on an assembly line adding steering wheels to cars I never have to wait for one car to finish to know which steering wheel to put on the next car. But if I work in a processor doing multiplication I may well have to wait for a previous instruction to finish to know what values I’m multiplying together.

    To avoid sitting idle, processors try to guess or speculate what the outcome of something they are waiting on might be and perform the operation with that guess in advance. If they guessed right, everything goes faster. If they guessed wrong, simply discard the guessed work and start over with the correct value.

  4. Out-of-order execution

    Sometimes the way code is written isn’t the fastest way to run it. Reordering instructions might let the processor start on a later instruction while it’s waiting for an earlier one to be ready.

    To facilitate this kind of optimization, processors convert the instructions we provide with their register numbers and so on into an information flow dependency graph structure by adding hundreds of new internal register numbers, each one only modified by one instruction. These instructions are then dispatched to multiple internal queues, with division instructions waiting on the division logic while memory read instructions wait on a separate memory read logic.

    Out-of-order execution dramatically increases the value of speculation; it is not uncommon to have a hundred instructions in various states of partial completion, more than half of which are speculatively executed and might be discarded if the speculation proves to be incorrect.

One analogy to understand all of these complexities is to think of a processor like a company. A simple processor is like a single-employee company where that one person does each task in the simplest, most direct way. A modern processor is like a thousand-employee company with managers and inboxes and reports and department. The work is the same in both, but the process is quite different and the amount of work that can be completed per day is much higher in the more complicated system.