Negative integers

1 Inefficient: Sign Bit

You likely learned that negative numbers are written with a special symbol indicating that they are negative, followed by their distance from 0. Something like 340-340, for example.

If we put this in a binary number, we use a specific bit to be the sign bit with the rest being the integer it’s attached to. It’s simple enough to store, but it means that every arithmetic circuit we build has to have some extra logic to handle all possible combinations of positive and negative operands. Doable, but not ideal.

2 Integers as a ring

2.1 Base-10

Suppose we have a base-10 number like you’re used to using outside of computing, but we fix it to always have the same number of digits. Any operation that would create a number that needs more digits would simply have the excess digits discarded. For example, if we had 2-digit numbers then 99+0199 + 01 would be 0000, not 100100 because we only keep two digits.

This design changes the number line you may have seen when learning arithmetic into a number ring.

-4 -3 -2 -1 0 +1 +2 +3 +4
The number line. Adding nn moves nn steps to the right.
1 2 3 4 5 6 7 8 9 0
The 1-digit number ring. Adding nn moves nn steps clockwise around the ring.

Now, recall that the definition of negative xx is a value which, when added to xx, yields 0. On both the number line and the number ring, we can find that value by mirroring the number across 0: on the number line, negative +3+3 is 3-3 ((+3)+(3)=0(+3)+(-3) = 0) while on the 1-digit number ring negative 33 is 77 (3+7=03 + 7 = 0 when truncated to 1 digit).

One of the surprising details of the number ring is that arithmetic doesn’t need to pay any attention to sign. We could even argue that the number ring doesn’t have a notion of sign at all: in a 4-digit ring 0340-0340 and 96609660 are the same thing; there’s no difference between them.

In a 1-digit number ring, negative 3 is 7.

3 times 7 is 21, or 1 when truncated to one digit.

3 times negative 3 should be negative 9, which is 1.

A random bit of trivia: on a number line only one number is its own negative, 0; but on a number ring, two numbers are their own negative, such as 0 and 5 for a 1-digit ring or 000 and 500 for a 3-digit ring.

2.2 Base-2

The principle of a number ring works just as well in base-2 as it does in base-10. It also makes more sense in a computer because computers have to physcially store numbers in a finite number of wires, making the number of bits naturally fixed.

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
The 4-bit number ring. Adding nn moves nn steps clockwise around the ring.

From the hardware perspective, having integers be a fix-bit ring makes everything simple: addition, subtraction, and multiplication simply work, no matter what sign the arguments have. But for comparison operations like < and for displaying numbers to the user, we need to know whether the 8-bit number 11001010 was intended to mean positive 202 or negative 54.

Most computer chips today are built with two different ways to make this determination. The unsigned operations treat all integers as positive: 11001010 = +202. The signed operations split the numbers so half are positive and half negative, using the highest-order bit to make this determination: 11001010 = ‒54 because its’ highest-order bit is 1, while 01001010 = +74 because its highest-order bit is 0.

This particular model of integers (treating them as a ring and using the highest-order bit as the sign) is called two’s complement, and is used in almost all computers today.

0000 +0 0001 +1 0010 +2 0011 +3 0100 +4 0101 +5 0110 +6 0111 +7 1000 ‒8 1001 ‒7 1010 ‒6 1011 ‒5 1100 ‒4 1101 ‒3 1110 ‒2 1111 ‒1
The 4-bit number ring, with the meaning of each number as a signed integer indicated. Adding nn moves nn steps clockwise around the ring.

Signed integers have some properties that are not always obvious to those new to them:

Mixing signed and unsigned integers.

Suppose I have the following code:

signed char x   =  -56; // 0b11001000
unsigned char y = +192; // 0b11000000
if (x < y) {
  printf("x < y");
} else if (x == y) {
  printf("x == y");
} else {
  printf("x > y");
}

What should this code do?

  • Cast both arguments to signed: 56>64-56 > -64 so x > y is printed.
  • Cast both arguments to unsigned: 200>192200 > 192 so x > y is printed.
  • Explicitly check if the signed is negative; 56-56 is smaller than any unsigned number so x < y is printed.

Different languages make different decisions on this. Some make comparing signed and unsigned values a compiler error. Some always cast both to signed; others both to unsigned. Some replace these comparisons with the more complicate check for sign followed by check of values if both are positive. And some leave this as an undefined or implementation-defined behavior.

Regardless of what your language chooses, your best bet in terms of both correctness and efficiency is to never mix signed and unsigned integers in the same expression.