The following list provides a 2–3-level-deep outline of the interactive graphics pipeline. It defaults to just an overview of the big steps; clicking on a step will show you more about that step, possibly including additional sub-steps.
Specify 3D geometry
This is done in the CPU, with the final results sent to the GPU for later drawing.
Older systems use what’s called immediate mode
where each bit
of geometry is rendered as soon as it is provided. Newer systems use
buffers
to store geometry data in video memory for much faster
rendering. The buffer workflow works as follows:
Define triangulated data
Often this will come from artist-defined data we load from a file. Sometimes we’ll construct it in the code either based on geometry (e.g. to make a sphere) or some kind of visual simulation (e.g. to make a mountain range).
This data is usually specified in model space
or object
space
with (0,0,0) either in the
middle of or at a corner of the data.
Convert to bytes and send to the GPU
We want the GPU to be able to organize data in a way that facilitates fast drawing, so we have two copies: one we control in the CPU, and one we draw from in the GPU.
Convert our model into one or more arrays of bytes.
Allocate a buffer, a region of memory on the GPU.
Send the bytes from the CPU to the GPU.
Tell the GPU how to interpret those bytes
Bytes only have meaning if we know how to group and parse them. We need to tell the GPU
What the bytes are for (vertex positions? triangle connectivity? other?)
How they are encoded (bytes per element, in what format)
If there’s any bytes to ignore (e.g. because we have mixed several data in one byte stream)
Each frame, ask the GPU to render the geometry
There are multiple ways to do this, but the most common sequence is
Clear the screen
Pick a shader program to run
Send the GPU the current value of global values called uniforms
For each model
Move vertices to view scene
The GPU manages how data flows between steps here, but allows us to provide code for what happens at each step.
Optionally, tesselation and geometry shaders make high-res geometry out of low-res input
This is not supported by WebGL nor most embedded systems, so if we want to do it we either need to use a hardware-specific API (we won’t in this course) or do it on the CPU as part of how we specify geometry.
Implement a vertex shader
This step is fully programmable so we can do whatever we wish here, but we almost always do the following:
Apply a model transformation to position and size the object in the scene
This converts from object coordinates to world coordinates.
Apply a view transformation to position the scene in front of the camera
We view the scene by assuming a fixed camera location: always at
(0,0,0) and pointing along the z axis. To move
the camera somewhere
else we instead move the entire scene so that our desired
camera location is at that fixed location and everything else located
around it.
This converts from world coordinates to view coordinates.
Compute aspect ratios and divisors for perspective projection
Aspect ratio
The GPU always called the left edge of the screen x=-1 and the right edge x=1; the bottom edge y=-1 and the top edge y=1. If the screen is non-square this will stretch or squish things. To counter that, we’ll preemptively squish or stretch them so the subsequent stretch or squish undoes our preemptively we squish or stretch.
Divisors
Things close to you look larger than things far from you. The GPU achieves that by allowing you to specify a w, representing depth into the scene, and dividing everything by w.
Depth
The GPU clips off things that are too close or too far from the camera. The nearest visible point has \frac{z}{w}=-1 and the farthest \frac{z}{w}=1. We almost always have to change z to make this work out.
Convert shapes to the pixels they cover
All parts of this step are built in to the GPU hardware with just a few small areas we can influence via parameters.
Primitive assembly
When drawing triangles, each triangle is made up of three vertices. WebGL provides several ways to specify which three vertices makes up each triangle and also allows pairs of vertices to form lines and single vertices to be rendered as points.
Frustum clipping
Both for efficiency and to prevent division-by-zero errors in the next step, each primitive is clipped to just the part that lies inside frustum, which is a truncated pyramid shape.
Frustum clipping can be implemented as clipping against six independent planes.
Clipping a triangle against a plane can leave it unchanged or dicard it entirely (when all vertices lie on the same side of the plane), replace it with a smaller triangle (when one vertex is inside and two outside the plane), or replace it with two smaller triangles (when one vertex is outside and two inside the plane).
Division by w
Linear perspective projection is achieved by having a depth-based divisor for each vertex, provided by the vertex shader. Dividing x and y by this w divisor term creates perspective projection. Dividing z by w creates a useful discretization of depth. Dividing everything else by w helps interpolate values acros triangles correctly.
Culling
If the three vertices of a triangle are in counter-clockwise order it is considerd to be front-facing; otherwise it is considered to be back-facing. Culling is diabled by default, but if enabled can be configured to discard either front or back faces.
It is possible to pose the back/front property in homogeneous coordinates and do culling immediately after primitive assembly, resulting in faster overall computation. This is usually seen as an optimization, not as the definition of culling.
Viewport transformation
At this point x, y, and z are all between −1 and 1. We change that here, adjusting x to be between 0 and raster width, y to be between 0 and raster height, and z to be between 0 and 1.
Rasterization and interpolation
The scanline algorithm applies either the Bresenham or DDA algorithm
in two dimensions to efficiently find the exact set of pixels that each
triangles covers. We call the bit of a triangle that covers one pixel a
fragment
. Scanline also interpolates each other per-vertex datum
we provide to each fragment.
Division by \frac{1}{w}
The x, y, and z coordinates are found in the form we want them by the scanline algorithm, but the other interplated data has an unresolved division. We take care of that division here.
Color each pixel
Broadly speaking, setting pixel colors is done by setting fragment colors (which is done by code we control) and then combining all the fragments (which is mostly built-in with a few parameters we control).
Fragment discarding
If it is known that some fragments will never be seen they will be discarded here.
If rendering directly to a display, the operating system that owns the display can discard fragments.
This is relatively uncommon.
If a scissor region was set up, fragments are discarded based on that.
This is relatively uncommon.
Stencil test
A stencil buffer is a raster the size of the frame buffer, typically set by special rendering calls. The stencil test compares each fragment with the corresponding pixel of the stencil buffer to decide if the fragment should be discarded.
There are a small number of customizable comparisons that can be used for this test.
Fragment Shader
This step is fully programmable so we can do whatever we wish here, but the most common parts are:
Interpolate some parameters
The vertex shader can provide any number of varying
values: out
variables written by the vertex shader that are
interpolated to each fragment and available as in
variables
in the fragment shader.
Look up some parameters
Often a few of the varyings are used to look up other values in a
large array, most often provided as an image called a
texture
.
Compute some parameters
Often the provided varyings care combined with some uniform or constant values to compute other values.
A common example is using a varying position of the fragment in the scene and a uniform position of a light in the scene to compute a direction to the light.
Evaluate a BSDF
The appearance of materials in light is modeled by a family of functions called Bidirectional Scattering Distribution Functions, or BSDFs. A versatile and simpler subset of BSDFs are the Bidirectional Reflectance Distribution Functions, or BRDFs.
Fragment shaders use the various parameters available to them to compute the color of the fragment based on the BSDF.
Other adjustments
Fragment shaders can modify fragment depth, discard unwanted fragments, and compute raster data other than colors.
None of these is very common.
Depth test
A depth buffer is raster the size of the frame buffer which stores the z value of each pixel. Fragments farther from the camera than that z are discarded.
There are a small number of customizable options for how the depth buffer is accessed and updated.
Blending
If a fragment was not discarded, it is used to change the color at
its pixel in the frame buffer. By default this is a simple replacement,
but there are other options whereby the existing color and the new
fragment’s color are combined in various ways. Collectively, all of
these are called blending
.
If blending occurs, it often uses a alpha
channel to model the
opacity of the fragment and the pixel. Because of that, the entire
blending process is sometimes called alpha blending
or alpha
compositing.
Write frame buffer
Once all the pixels are fully computed, the resulting values are placed as bytes in an image.
Multisample
Full-screen anti-aliasing (FSAA) improves image quality by rendering at a much higher resolution that the final image, then averaging clusters of pixels. That averaging is taken care of here.
Gamma
The eye does not perceive light linearly, and every bit counts when it comes to images. Gamma encoding is a way of attempting distribute bits of an encoding in a way that matches visual acuity. In the ideal gamma encoding, encoded light intensities i and i + \epsilon should look equally far apart regardless of i.
Ideal gamma encoding depends on many things, including the specific
viewer’s eye and the background light in the room when viewing the
screen. An accepted good
gamma is the sRGB gamma function which
uses roughly 50% of the available bit patterns to encode the darkest 20%
of light.
Dither
We have finite number of bits to store color with, but want more color detail in the image. Dithering solves this problem but ensuring that if we blur the image we’ll get more accurate colors. There are several ways to do this; two simple ones are
Stochastic
Stochastic dithering uses is randomized rounding. If a pixel’s red channel is computed as 121.85, dithering will randomly pick wither 121 (15% chance) or 122 (85% chance) as the number to store.
Error diffusion
Error diffusion uses detects the error introduced by quantizing one pixel and distributes it to the neighboring pixels. If 121.85 is approximated as 122, we reduce the target brightness of the neighboring pixels by a cumulative amount equal to the 0.15 we over-used; for example, we might reduce three neighbors each by 0.05.
Dithering is used to avoid the eye noticing transitions between the finite set of available colors. For very limited color pallettes it also simulates more colors ar the cost of making the scene look noisy.
Another view shows how the GPU-run parts operate