Light is delivered in photons, each of which has a wavelength. Visible light wavelengths are roughly from 380nm to 750nm.
Your eye has 4 types of color receptors: rods, L-cones, M-cones, and S-cones.1 All four operate on the same principle:
Rods react to most of the visible spectrum and react very easily, so except in very dim-light situations they effectively signal as soon as their reset time elapses, providing no useful information to the brain. They’re important for night vision, but not for color.
The most direct way to model color perception would thus be as three durations: the average time between two firings of an L-cone, of an M-cone, and of an S-cone. Thus, we might say the cone response is (0.1s, 2.4s, 0.4s) to mean the L-cone is firing almost immediately after reset, the M-cone is almost idle, and the S-cone is somewhere in between.
Seconds are awkward in this context in part because they are
unbounded, approaching infinity in complete darkness; and because the
difference in light intensity needed to go from 0.11s to 0.10s is much
much larger than the difference needed to go from 11s to 10s. But the
relaitonship between seconds and light intensity is a smooth and
monotonic function so there is some normalizing function that would
convert these seconds into a some kind of nicely-behaved range where 0
means never firing
and 1 means firing as often as
possible
. We’ll skip the math and assume we have access to that
normalized cone signaling numbers.
Normalized cone signaling would form a cube, from (0,0,0) to (1,1,1).
But we don’t perceive a cube of color. There are various ways that (0.1,
0.2, 0.3) can change into (0.2, 0.4, 0.6) with no change in the color of
the object being observed, including pupil dilation, clouds moving away
from the sun, etc. We can perceive this overall intensity of light or
luminosity
, but our brain filters out overall luminosity and
perceives only relative luminance2 of adjacent colors. So a
better model of color is as 2D chromaticity vector (l,m,s) where l+m+s
= 1, coupled with a separate luminance level in the form of a
scalar being multiplied by the whole vector.
Ignoring the luminance component for the time being, our cone-based chromaticity space is now a triangle:
In order for us to display the color represented by some point in this triangle, we have to find a combination of light of various wavelengths that triggers that ratio of response in the three cone types. The first step toward that goal is to find where each individual wavelength falls in this triangle. Fortunately, others have done this work for us and published the results; using the table from Stockman, McLeod and Johnson (1993)3 we get
By combining multiple wavelengths we can get any point inside that curve. Points outside the curve represent cone responses that cannot be triggered by any combination of light.5 In other words, only a subset of theoretical cone responses actually represent colors.
It is worth noting that not every eye is the same. The exact ratios
of pigments inside the cones vary by individual, meaning the exact same
wavelengths of light might cause a response of (0.1, 0.4, 0.5) in one
individual and (0.1, 0.45, 0.45) in another. The variations are
typically fairly small, and when larger are called color
blindness
(a term that also refers to more extreme variations such
as having only two types of working cones).
Computer screens want to be able to show a lot of colors. But creating arbitrary sets of wavelengths is quite expensive, so we want to pick just a few wavelengths that we can combine to make most colors. Because the curve is roughly triangular, picking three such primary colors is a common choice. But getting a single wavelength is much harder than a narrow band of wavelengths and some wavelengths the eye hardly perceives at all so it would take too much energy to use them effectively in a display.
As a result, early color displays played with several different colors of light, but by 1993 had mostly settled on ITU-R Recommendation 709 which used colors made of several wavelengths, which can be plotted on our cone triangle as follows.
You’ll notice that there is space in the above image that is inside the visible-color curve but outside the colored-in triangle. Those colors exist in the real world, but cannot be replicated on this screen. Later ITU-R recommendations (Rec.2020 published in 2012 and Rec.2100 published in 2016) suggest single-wavelength lights to get a larger region, but while that makes the triangle bigger it still leaves out some colors. Most of the monitors I checked in 2021 still used Rec.709 colors instead, which is why I rendered the Rec.709 triangle above.
Note that some colors can be represented on a monitor but are not shown in the diagram because they differ from the diagram’s colors only in luminance. For example, yellow, is a higher-luminance version of the point halfway between the green and red corners and forest green is a lower-luminance version of a point near the green corner of the triangle.
Let’s review where we stand:
Red,
Green, and
Blueas by themselves they (roughly) correspond to colors with those names.
We’ve almost reached how color is actually stored in a computer; the
missing component is gamma
.
Let’s assume we have two colors that differ only in luminance. The
formal term for this is having the same chromaticity
. The eye
roughly distinguishes the two by the relative luminance, so 1.0 and 0.9
are seen as much more similar than are 0.2 and 0.1. The formal term for
this relative perception is lightness
.
Given that we want to store a grid of millions of colors for every picture we store, we want the storage to be small, so we have an incentive to represent colors on a logarithmic scale instead of a linear scale to handle this relative comparison. But a pure logarithmic scale is not ideal: while the the eye is better at distinguishing dark colors than bright colors, that breaks down for the very darkest colors, making a logarithm not quite right.
In exploring efficient ways of encoding lightness, early efforts
developed a simple-to-implement-in-analog-electronics system known as
gamma
, from the generic equation r =
s^{\gamma} where s is the stored
lightness and r is the real luminosity.
A gamma of \gamma = 2.2 was found to be
a good value, and many CRT displays were manufactured that applied that
equation automatically in their circuitry. Some also featured an
adjustable gamma knob so that, by moving physical wire coils and so on,
the \gamma in r = s^{\gamma} could be adjusted.
As digital display technologies became prevalent and a simple formula
was no longer needed, the name gamma stuck, as did most of the
implementation. However, because the eye is not good at distinguishing
very dim colors people found that a raw gamma scale resulted in 3–5% of
all color signals being perceived as indistinguishably black
. So
modern gamma correction
is generally a piecewise equation, linear
for dim colors and polynomial for brighter colors. The most common
version is the sRGB gamma function, defined for a color range of 0–1
(not the 0–255 range sometimes used to encode in bytes), is
L_{\text{linear}} = \begin{cases} L_{\text{sRGB}}/12.92 &\text{if }L_{\text{sRGB}} \le 0.04045 \\ \displaystyle \left(\frac{L_{\text{sRGB}}+0.055}{1.055}\right)^{2.4} &\text{if }L_{\text{sRGB}} > 0.04045 \end{cases} L_{\text{sRGB}} = \begin{cases} 12.92 L_{\text{linear}} &\text{if }L_{\text{linear}} \le 0.0031308 \\ 1.055{L_{\text{linear}}}^{1/2.4}-0.055 &\text{if }L_{\text{linear}} > 0.0031308 \end{cases}
The above functions assume each color channel is stored between 0 (black) and 1 (as bright as the screen can manage). To store that channel in b bits we use a factor of 2^b-1 in addition to the sRGB gamma function.
There are many names and uses of each of the two encodings gamma creates:
In this space, light intensity (i.e. luminosity or watts of light energy) is directly proportional to numbers, making it what we want our display output to be. That make it a linear space which is what we need for any type of interpolation (such as by DDA/Bresneham) or lighting computation (such as we do often in 3D graphics). As such it is the space assumed in most 3D graphics APIs like WebGL.
Because 3D standards like WebGL use linear coordinates, the WebGL
color 0.5
is given on the vertical axis and hence 0.735 on
the horizontal axis of the above figure.
In this space, binary encodings are optimally efficient (assuming 1
is the brightest light in view
: sRGB is
calibrated assuming it is used in a dark room, not in a bright space).
That makes it a nonlinear space good to
storage in image files and over-the-wire transmission
to as display input sent to screens and other display
devices. Because it is what screens process and files store, it is used
in standards that intend to enumerate all representable
colors such as CSS and other non-3D web standards.
Because web standards like CSS use sRGB, the CSS color
#7F7F7F
is around 0.5 on the horizontal axis and hence
0.212 on the vertical axis of the above figure.
While higher-range images are starting to gain popularity, most
displays and image formats use 8 bits each for Red, Green, and Blue. Web
standards have popularized representing the result as three hexadecimal
bytes, RGB, each written with two hex digits in a row with a hashtag in
front, like #e3b021
for this color: .
Unless otherwise noted, it is safe to assume these are in the sRGB color
space and stored post-gamma-correction, which means that most monitors
can display them as-is but that arithmetic on the colors requires
conversion to linear display intensities before it is accurate.
Systems interested in optimal compression of information, ranging
from the earliest color television broadcast signals to the most recent
compression/decompression algorithms (also know as codecs
), have
observed that the eye perceives chromaticity at a coarser resolution
than it perceives luminance. It is thus advantageous to decompose color
using a different set of axes than RGB, using one axis for some form of
brightness, lightness, luminance, or luminosity, so that we can easily
transmit more bits of luminance than bit of chromaticity.
HSL dates back to 1938, was designed for color television, and was the first separate-lightness model to be widely implemented in hardware. In it, Hue is an angle around the color wheel and Saturation is intensity of color (i.e., non-gray-ness).
HSL has an angular coordinate (H) which makes it awkward for use in digital computation. As digital replaced analog in displays, YCbCr became more popular than HSL. In YCbCr, Y represents luminance (or Y’ represents luma, a gamma-corrected version of luminance; both YCbCr and Y’CbCr are used), Cb is how much of that luminosity comes from blue and Cr is how much of that luminosity comes from red. Along with a few variants like YUV and YcCbcCrc that replace Cb and Cr with similar numbers computed in slightly different ways, YCbCr is the dominant way color is stored for lossy compression of digital media, including JPEG, H264, HEVC, VP9, and so on.
YCbCr and its relatives are capable of expressing nonsensical colors: ones with negative or more-than-100% levels of some primary colors. Thus, YCbCr encodings never contain those particular color expressions6.
By design, HSL and YCbCr represent the same chromaticities as RGB. Sometimes it is desirable to use a model that allows representing all chromaticities, not just those a computer can display. CIE 1931 and CIELUV were designed by international standards bodies and both roughly approximate a warped version of the LMS triangle discussed earlier so that the curve comes close to filling a square and is stretched to make distance in the warped space roughly approximate human perceived difference of colors. While not designed for digital display, both of CIE models (and several related variants) are used to describe calibrations of display components and the interrelationship of different color models and different kinds of displays.
The above discussion applies to light-emitting displays: laptops, desktops, phones, etc. Print media has additional constraints.
The basic function of colored ink is to absorb some wavelengths and not others. Cyan ink, for example, absorbs most wavelengths that we’d see as being orange or red. Picking the best inks is tricky: if they absorb too many wavelengths then they overlap and some bright colors become unpresentable; if they absorb too few then some wavelengths cannot be absorbed and some dark colors become unpresentable.
The most common compromise is to have a CMYK printer: Cyan, Magenta, and Yellow ink that each remove a bit shy of a third of the visible spectrum and blacK ink that absorbs the entire visible spectrum and can help approximate those darker colors that the gaps between the CMY absorptions make hard by themselves. Common CMYK inks together cover a smaller and somewhat different region of possible colors than RGB.
Higher-end printing systems use many more inks covering many narrow bands of photons to give a higher degree of color fidelity. The Pantone system is probably the best known with 14 base pigments instead of CMYK’s 4.
Even with many pigments, print and light colors have different coverage than one another. In an RGB display, white is (1,1,1) meaning that roughly ⅓ of light energy is coming from wavelengths that look purely red. But on paper, white is a mix of all the visible spectrum, and to filter out all but the red light will result in a darker red (relative to white) than would result if we turned off the G and B components of an RGB display. With enough pigments the chromaticities could be made to match, but not the luminances. Unless, that is, you print with florescent pigments that absorb photons of one wavelength and release that energy as photons of a different wavelength, or are viewing the printed medium under RGB-based lights, which some florescent and LED lights are, or…
Beyond noting that these complexities exist, we’ll mostly ignore pigment-based color in this class.