MP2: Optical Flow

In this lab, you'll calculate optical flow from a low-res video, then use the optical flow field to interpolate high-res images to make a high-res video.

In order to make sure everything works, you might want to go to the command line, and run

pip install -r requirements.txt

This will install the modules that are used on the autograder, including numpy, h5py, and the gradescope utilities.


Part 1: Loading the video and image files

First, let's load the low-res video. This video was posted by NairobiPapel(Kamaa) at https://commons.wikimedia.org/wiki/File:Cat_Play.webm under a CreativeCommons Attribution-ShareAlike license.

The high-res images are provided once per 30 frames (once per second). There are four of them, corresponding to frames $30s$ for $s\in\{0,\ldots,3\}$. Let's load them all as grayscale (add the three colors).

You would need ffmpeg in order to extract frames from the video. You should probably install ffmpeg. But just in case you haven't, the frames are provided in the lowres directory.


Part 2: Further Smooth the Low-Res Image

First, load submitted.py.

First, in order to make the gradient estimation smoother, we'll smooth all of the low-res images

The Gaussian smoothing kernel is: $$h[n] = \left\{\begin{array}{ll} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{n-(L-1)/2}{\sigma}\right)^2} & 0\le n\le L-1\\0 & \mbox{otherwise}\end{array}\right.$$

You should implement this as a separable filter, i.e., convolve in both the row and column directions: $$z[r,c] = h[r]\ast_r x[r,c]$$ $$y[r,c] = h[c]\ast_c z[r,c]$$ where $\ast_r$ means convolution across rows (in the $r$ direction), and $\ast_c$ means convolution across columns.


Part 3: Calculating the Image Gradient

Now that we have the smoothed images, let's find their gradient. Use a central difference filter: $$h[n] = 0.5\delta[n+1]-0.5\delta[n-1]$$

You will need to compute three different gradients: the column gradient $g_c$, row gradient $g_r$, and frame gradient $g_t$, defined as: $$g_t[t,r,c] = h[t] \ast_t x[t,r,c]$$ $$g_r[t,r,c] = h[r] \ast_r x[t,r,c]$$ $$g_c[t,r,c] = h[c] \ast_c x[t,r,c]$$ where $x[t,r,c]$ should be the smoothed video.

All of the samples of $x[t,r,c]$ are positive, of course, but the three gradient images have equal parts positive and negative values. Matplotlib will automatically normalize those things for us, but it's useful to put a colorbar on each image, so we can see what values of the gradient are matched to each color in the image.


Part 4: Estimate Optical Flow using the Lucas-Kanade Algorithm

Wikipedia has good summaries of optical flow and the Lucas-Kanade algorithm:

We will calculate optical flow in


Part 5: Median Filtering

The pixel velocity fields look pretty good, except for a few single-pixel outliers. Those can be eliminated by using median filtering.

We will use a 2D median filtering algorithm. This follows exactly the code at https://en.wikipedia.org/wiki/Median_filter#Two-dimensional_median_filter_pseudo_code, except for the edges of the image. The Wikipedia code deals with the edges of the image by simply not computing them. In our case, we'll deal with the edges by computing the median in a window of reduced size. Thus:


Part 6: Upsampling the Velocity Fields

Now, in order to use the velocity fields to synthesize a high-res video, we need to upsample the velocity fields to the same size as the high-res images, i.e., $270\times 480$.

The low-res image is $135\times 240$, which is $2$ times smaller than the high-res image. The Lucas-Kanade algorithm further downsampled by $6\times 6$, so our total upsampling factor is $12\times 12$.

We will use bilinear interpolation (linear in both row and column dimensions) in order to upsample the image. This has two parts: (1) upsample to the desired image size, and then (2) filter in both row and column directions by a linear interpolation kernel.


Part 7: Quantizing the Velocity Vectors

The upsampled velocity vectors have two remaining problems:

  1. They express velocities in the low-resolution image, not the high-resolution. For example, a movement of -1.5 pixels in the low-res image corresponds to a movement of -3 pixels in the high-res image.
  2. They are real-valued. In order to move a pixel, we want the velocity vectors to be quantized to integers.

The function scale_velocities scales and then quantizes the velocity vectors. Use int(np.round(...)) to do the quantization.


Part 8: Creating the High-Resolution Video Using Velocity Fill

So, now, we already have the high-resolution video highres[t,:,:] for the frames $t\in\left\{0,30,60,90\right\}$. For all other frames, the high-resolution video is currently zero.

Let's use the velocity vector to fill in the missing frames:

$$\mbox{highres}[t,r,c]= \left\{\begin{array}{ll} \mbox{highres}[t,r,c] & t\in\left\{0,30,60,90\right\}\\ \mbox{highres}[t-1,r,c] & r\ge 264\\ \mbox{highres}[t-1,r-v_r[t-1,r,c],c-v_c[t-1,r,c]] & \mbox{otherwise} \end{array}\right.$$

where $v_r[t,r,c]$ and $v_c[t,r,c]$ refer to the ones that have been interpolated, scaled, and quantized.

scaled_vr and scaled_vc have shape of (264,480), but highres has a shape of (270,480), so the last six rows of every image are just copied from the preceding image.

In this example, you can see that optical flow is working, but not really very well. It stretches out the cat's head to the left, but doesn't start turning it, so the turn in frame 30 is sudden. Apparently our optical flow algorithm is missing the turn.

In a real application, we would try to connect the velocity vectors both forward and backward in time, in order to make a smooth trajectory for every pixel.

If you'd like to watch the resulting video, you will need to first install ffmpeg. Then you can use the following block of code to save the images to the directory generated, then use the following command to make the video:

ffmpeg -r 30 -i generated/cat%04d.jpg generated.webm


Part 9: How to Debug!!

If you reached this point in the notebook, then probably your code is working well, but before you run the autograder on the server, you should first run it on your own machine.

You can do that by going to a terminal, and running the following command line:

python grade.py

If you get any error messages, we recommend that you use the provided solutions.hdf5 in order to debug. That can be done as follows:


Extra Credit

You can earn up to 10% extra credit on this MP by finishing the file called extra.py, and submitting it to the autograder.

When you unpack the file mp2_extra.zip, it will give you the following files:

The extra credit assignment is actually pretty simple this time: given a low-resoluton video, and a small set of high-resolution frames, try to reconstruct the high-resolution video.

The function that the grader will call is this function called animate:

As noted in the docstring, your answer doesn't need to have anything to do with optical flow; it can be any answer you like. There are some remarkably simple answers that give remarkably good results, so please feel free to be creative (exception: don't just download the ground truth from the web. That won't work well, anyway, because you'll fail the hidden tests).

Solutions for the cat video are provided in extra_solutions.hdf5. This file contains three objects:

We do not know of any solution to this problem that will exactly reconstruct f['highres_ref']. Instead, you will receive more or less extra credit points depending on the SNR with which you're able to estimate f['highres_ref']. SNR is defined in tests/text_extra.py, which has the same definition as the following line:

If you want to see where your reconstructed image differs from the original image, you can do it in the following way.