You likely learned that negative numbers are written with a special symbol indicating that they are negative, followed by their distance from 0. Something like , for example.
If we put this in a binary number, we use a specific bit to be the sign bit
with the rest being the integer it’s attached to. It’s simple enough to store, but it means that every arithmetic circuit we build has to have some extra logic to handle all possible combinations of positive and negative operands. Doable, but not ideal.
ring
Suppose we have a base-10 number like you’re used to using outside of computing, but we fix it to always have the same number of digits. Any operation that would create a number that needs more digits would simply have the excess digits discarded. For example, if we had 2-digit numbers then would be , not because we only keep two digits.
This design changes the number line
you may have seen when learning arithmetic into a number ring.
Now, recall that the definition of negative
is a value which, when added to , yields 0
. On both the number line and the number ring, we can find that value by mirroring the number across 0: on the number line, negative is () while on the 1-digit number ring negative is
( when truncated to 1 digit).
One of the surprising details of the number ring is that arithmetic doesn’t need to pay any attention to sign. We could even argue that the number ring doesn’t have a notion of sign at all: in a 4-digit ring and are the same thing; there’s no difference between them.
In a 1-digit number ring, negative 3 is 7.
3 times 7 is 21, or 1 when truncated to one digit.
3 times negative 3 should be negative 9, which is 1.
A random bit of trivia: on a number line only one number is its own negative, 0; but on a number ring, two numbers are their own negative, such as 0 and 5 for a 1-digit ring or 000 and 500 for a 3-digit ring.
The principle of a number ring works just as well in base-2 as it does in base-10. It also makes more sense in a computer because computers have to physcially store numbers in a finite number of wires, making the number of bits naturally fixed.
From the hardware perspective, having integers be a fix-bit ring makes everything simple: addition, subtraction, and multiplication simply work, no matter what sign
the arguments have. But for comparison operations like <
and for displaying numbers to the user, we need to know whether the 8-bit number 11001010
was intended to mean positive 202
or negative 54
.
Most computer chips today are built with two different ways to make this determination. The unsigned operations treat all integers as positive: 11001010
= +202. The signed operations split the numbers so half are positive
and half negative
, using the highest-order bit to make this determination: 11001010
= ‒54 because its’ highest-order bit is 1, while 01001010
= +74 because its highest-order bit is 0.
This particular model of integers (treating them as a ring and using the highest-order bit as the sign) is called two’s complement, and is used in almost all computers today.
Signed integers have some properties that are not always obvious to those new to them:
Zero is represented by only 0 bits: 000…00
Zero is its own additive inverse: 000…00 + 000…00 = 000…00. Thus, for signed numbers, -(0) = 0
.
Negative one is represented by only 1 bits: 111…11
The most positive number is a 0 bit followed by all 1 bits: 011…11. This has the value where is the number of bits; for example, if then the most positive signed integer is .
The most negative number is a 1 bit followed by all 0 bits: 100…00. This has the value where is the number of bits; for example, if then the most negative signed integer is .
The most negative number is its own additive inverse: 100…00 + 100…00 = 000…00. Thus, for signed 8-bit numbers, -(-128) = -128
.
Mixing signed and unsigned integers.
Suppose I have the following code:
signed char x = -56; // 0b11001000
unsigned char y = +192; // 0b11000000
if (x < y) {
("x < y");
printf} else if (x == y) {
("x == y");
printf} else {
("x > y");
printf}
What should this code do?
x > y
is printed.x > y
is printed.x < y
is printed.Different languages make different decisions on this. Some make comparing signed and unsigned values a compiler error. Some always cast both to signed; others both to unsigned. Some replace these comparisons with the more complicate check for sign followed by check of values if both are positive. And some leave this as an undefined or implementation-defined behavior.
Regardless of what your language chooses, your best bet in terms of both correctness and efficiency is to never mix signed and unsigned integers in the same expression.