ECE 417 Lecture 7: Image Upsampling and Downsampling

Mark Hasegawa-Johnson, September 16, 2021

This file is distributed under a CC-BY license. You may freely re-use or re-distribute the whole or any part. If you re-distribute a non-trivial portion of it, give me credit.

Outline of Today's lecture

Preliminaries

First let's load some libraries, and some data.

Image as a signal

Notice that, in this case, the size of the image is $(240,480,3)$, meaning that it has 240 rows, 480 columns, and 3 colors. imageio creates a numpy array indexed in that order (rows, columns, colors); other packages index in other orders, so you have to be careful each time you use a new package. In any case, we can index this image as

$$x[m,n,k]$$

where we'll use

Decimation

Decimation just means we remove $D-1$ out of every $D$ samples, thus $$y[m,n,k]=x[mD,nD,k]$$

Fourier transforms of the high-resolution and downsampled images

Remember that the multi-dimensional Fourier transform is $$X(\omega_1,\omega_2,\omega_3)=\sum_{n_1}\sum_{n_2}\sum_{n_3}x[n_1,n_2,n_3]e^{-j(\omega_1n_1+\omega_2n_2+\omega_3n_3)}$$

Since this is a separable operation, we can learn a lot by just focusing on one dimension at a time. For example, let's define the DTFT of a row to be:

$$X(m,\omega,k)=\sum_n x[m,n,k]e^{-j\omega n}$$$$Y(m,\omega,k)=\sum_n y[m,n,k]e^{-j\omega n}$$

But what is the relationship between $Y(\omega)$ and $X(\omega)$? Hint: it involves aliasing.

This is easiest to analyze in two steps. First, multiply $x[n]$ by $$p[n]=\sum_{m=-\infty}^\infty \delta[n-mD]$$ in order to get the signal $$v[n]=p[n]x[n]$$

What's the Fourier transform of a pulse train?

The Fourier transform of a pulse train is a pulse train:

$$p[n]=\sum_{m=-\infty}^\infty \delta[n-mD]$$

which has a transform of $$P(\omega)=\frac{2\pi}{D}\sum_{k=0}^{D-1} \delta\left(\omega-\frac{2\pi k}{D}\right)$$

Multiplying in the time domain is convolving in the frequency domain

Remember that $v[n]=p[n]x[n]$. Remember, also, that multiplying two signals in the time domain is the same as convolving them in the frequency domain:

$$v[n]=x[n]p[n] \Leftrightarrow V(\omega)=\frac{1}{2\pi} P(\omega)\ast X(\omega)$$

Since convolving any signal with an impulse is the same as shifting the signal, we get that

$$P(\omega)=\frac{2\pi}{D}\sum_{k=0}^{D-1} \delta\left(\omega-\frac{2\pi k}{D}\right)$$

implies that

$$V(\omega) = \frac{1}{D} \sum_{k=0}^{D-1}X\left(\omega-\frac{2\pi k}{D}\right)$$

Notice that $v[n]$ has all of the same samples as the downsampled image $y[n]$ -- they're just spread out, with 3 zeros between each nonzero sample. Getting rid of the zeros doesn't change the spectrum, so we can compute it as follows:

$$y[n] = v[nD]$$$$Y(\omega)=\sum_n y[n]e^{-j\omega n}$$$$=\sum_m v[m=nD]e^{-j\omega m/D}$$$$=V\left(\frac{\omega}{D}\right)$$

In other words, $Y(\omega)=V(\omega/D)$, which basically means that we just relabel the frequency axis -- anything that happens at $\omega=2\pi/D$ on the frequency axis for $V(\omega)$ happens at $\omega=2\pi$ on the frequency axis for $Y(\omega)$. In particular, both $V(\omega)$ and $Y(\omega)$ have the same aliasing:

$$Y(\omega)= \left(\frac{1}{D}\right)\sum_{d=0}^{D-1} X\left(\frac{\omega-2\pi d}{D}\right)$$

Downsampling

We can avoid aliasing by lowpass filtering the signal prior to decimation.

The term downsampling, in general, just means "reducing the sampling rate." The right way to downsample is to lopass filter, and then to decimate.

An ideal lowpass filter with cutoff frequency $\omega_c$ is given by $$h[n]=\frac{\omega_c}{\pi}\mbox{sinc}(\omega_c n)=\frac{\sin(\omega_c n)}{\pi n}$$ When we create a lowpass filter by hand, we have to be careful with the $n=0$ sample: $$h[0]=\frac{\omega_c}{\pi}$$ In order to avoid aliasing, we need $$\omega_c=\frac{\pi}{D}$$ We can approximate an ideal lowpass filter by creating a reasonably long, rectangular-windowed FIR filter (Hamming window would be better). Then we can lowpass filter by convolving each row and each column: $$x_{lpf}[n]=h[n]\ast x[n]$$ $$y_{lpf}[n] = x_{lpf}[nD]$$

Separable Filter

The np.convolve2d function implements convolution in two dimensions, but it's very slow to convolve in 2D (complexity is O($M^2N^2$)).

A much faster method is to first convolve each column (complexity: O($N^2$)), then convolve each row (complexity: O($M^2$)), total complexity is only O($M^2+N^2$). This is called "separable filtering".

Here, we use mode='nearest' in order to repeat the outermost edge outward as padding for the convolution.

Now let's take its Fourier transform, to make sure that it has been lowpass filtered.

Downsampling = Filtering followed by decimation

Now let's decimation the filtered image. This should give a spectrum with much less aliasing.

Let's compare the Fourier transforms of the cat that's just been downsampled, versus the one that's been lowpass filtered before downsampling, versus the Fourier transform of the original high-resolution cat image.

Upsampling

Upsampling is the process of creating a larger image, from a smaller image, by just inserting zeros: $$z[m,n,k] = \left\{\begin{array}{ll} y[m/D,n/D,k] & m/D,~n/D~\mbox{are integers}\\ 0 & \mbox{otherwise} \end{array}\right.$$

Notice: $z[n]$ is exactly the same as the signal $v[n]$ that we saw before!! So it has aliasing.

Again, the problem is aliasing: $$Z(\omega) = V(\omega) = Y(D\omega)$$

This time, though, the aliasing is much more visible in the image. In fact, the image is mostly black dots, with a few spots of color (one per $D\times D$ square).

Piece-wise constant interpolation

The solution is obvious: rather than just filling zeros between the upsampled samples, we need to fill in some meaningful value. The first solution to consider is piece-wise constant interpolation, sometimes called zero-order hold (ZOH).

$$z[m,n,k]=y[\mbox{int}(m/D),\mbox{int}(n/D),k]$$

This results in some aliasing, but not as bad as the upsampled cat.

Piece-wise linear (bilinear) interpolation

Piece-wise linear interpolation, in two dimensions at the same time, is called bilinear interpolation.

We can accomplish it by literally doing a piece-wise linear interpolation of the signal, as in the following code:

Notice that piece-wise-linear interpolation is just like upsampling the cat, and then convolving with the piece-wise linear interpolation filter:

$$h_{PWL}[n] = \left\{\begin{array}{ll} \frac{D-|n|}{D} & -D\le n\le D\\ 0 & \mbox{otherwise}\end{array}\right. $$

Similarly, piece-wise constant interpolation is just like upsampling the cat, and then convolving with the piece-wise constant interpolation filter:

$$h_{PWC}[n] = \left\{\begin{array}{ll} 1 & 0\le n<D\\ 0 & \mbox{otherwise}\end{array}\right.$$

Just to prove it, let's recompute the bilinear-interpolated cat using the triangle filter above, and show that it's the same image.

Sinc Interpolation

PWC interpolation suffers from obvious blocky artifacts. PWL interpolation smooths away most of those, but not all. We can get rid of all of them, and get the lowpass-filtered cat back again exactly, by filtering the upsampled cat using an ideal sinc function.

$$z[n]=D^2 h_{LPF}[n]\ast y[n]$$

Multiplying by a factor of $D^2$ is necessary because we're trying to construct $D^2$ output samples from every one input sample.

Summary of Visible Artifacts