In this lab, you'll calculate optical flow from a low-res video, then use the optical flow field to interpolate high-res images to make a high-res video.
In order to make sure everything works, you might want to go to the command line, and run
pip install -r requirements.txt
This will install the modules that are used on the autograder, including numpy, h5py, and the gradescope utilities.
First, let's load the low-res video. This video was posted by NairobiPapel(Kamaa) at https://commons.wikimedia.org/wiki/File:Cat_Play.webm under a CreativeCommons Attribution-ShareAlike license.
from IPython.display import Video
Video("cat.webm")
The high-res images are provided once per 30 frames (once per second). There are four of them, corresponding to frames $30s$ for $s\in\{0,\ldots,3\}$. Let's load them all as grayscale (add the three colors).
import matplotlib.image
import numpy as np
highres = np.zeros((91,270,480),dtype=float)
for s in range(4):
highres[30*s,:,:] = np.sum(matplotlib.image.imread('highres/cat%4.4d.jpg'%(30*s)).astype(float), axis=2)
print(highres.dtype)
print(highres.shape)
float64 (91, 270, 480)
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 5))
plt.imshow(highres[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb6633f8ca0>
You would need ffmpeg in order to extract frames from the video. You should probably install ffmpeg. But just in case you haven't, the frames are provided in the lowres directory.
import numpy as np
lowres = np.zeros((91,135,240),dtype=float)
for t in range(91):
lowres[t,:,:] = np.sum(matplotlib.image.imread('lowres/cat%4.4d.jpg'%(t)).astype(float), axis=2)
print(lowres.dtype)
print(lowres.shape)
float64 (91, 135, 240)
plt.figure(figsize=(14, 5))
plt.imshow(lowres[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb662dfb850>
First, load submitted.py.
import submitted
import importlib
importlib.reload(submitted)
print(submitted.__doc__)
This is the module you'll submit to the autograder. There are several function definitions, here, that raise RuntimeErrors. You should replace each "raise RuntimeError" line with a line that performs the function specified in the function's docstring.
First, in order to make the gradient estimation smoother, we'll smooth all of the low-res images
help(submitted.smooth_video)
Help on function smooth_video in module submitted: smooth_video(x, sigma, L) y = smooth_video(x, sigma, L) Smooth the video using a sampled-Gaussian smoothing kernel. x (TxRxC) - a video with T frames, R rows, C columns sigma (scalar) - standard deviation of the Gaussian smoothing kernel L (scalar) - length of the Gaussian smoothing kernel y (TxRxC) - the same video, smoothed in the row and column directions.
The Gaussian smoothing kernel is: $$h[n] = \left\{\begin{array}{ll} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{n-(L-1)/2}{\sigma}\right)^2} & 0\le n\le L-1\\0 & \mbox{otherwise}\end{array}\right.$$
You should implement this as a separable filter, i.e., convolve in both the row and column directions: $$z[r,c] = h[r]\ast_r x[r,c]$$ $$y[r,c] = h[c]\ast_c z[r,c]$$ where $\ast_r$ means convolution across rows (in the $r$ direction), and $\ast_c$ means convolution across columns.
importlib.reload(submitted)
smoothed = submitted.smooth_video(lowres, sigma=1.5, L=7)
print(smoothed.shape)
(91, 135, 240)
plt.figure(figsize=(14, 5))
plt.imshow(smoothed[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb6629f6b20>
Now that we have the smoothed images, let's find their gradient. Use a central difference filter: $$h[n] = 0.5\delta[n+1]-0.5\delta[n-1]$$
You will need to compute three different gradients: the column gradient $g_c$, row gradient $g_r$, and frame gradient $g_t$, defined as: $$g_t[t,r,c] = h[t] \ast_t x[t,r,c]$$ $$g_r[t,r,c] = h[r] \ast_r x[t,r,c]$$ $$g_c[t,r,c] = h[c] \ast_c x[t,r,c]$$ where $x[t,r,c]$ should be the smoothed video.
importlib.reload(submitted)
help(submitted.gradients)
Help on function gradients in module submitted: gradients(x) gt, gr, gc = gradients(x) Compute gradients using a first-order central finite difference. x (TxRxC) - a video with T frames, R rows, C columns gt (TxRxC) - gradient in the time direction gr (TxRxC) - gradient in the vertical direction gc (TxRxC) - gradient in the horizontal direction In order to avoid weird numerical problems, please set gt[0,:,:]=0, gt[-1,:,:]=0, gr[:,0,:]=0, gr[:,-1,:]=0, gc[:,:,0]=0, and gc[:,:,-1]=0.
All of the samples of $x[t,r,c]$ are positive, of course, but the three gradient images have equal parts positive and negative values. Matplotlib will automatically normalize those things for us, but it's useful to put a colorbar on each image, so we can see what values of the gradient are matched to each color in the image.
importlib.reload(submitted)
gt, gr, gc = submitted.gradients(smoothed)
print(gt.shape)
(91, 135, 240)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 10))
plt1 = ax1.imshow(gt[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Time Gradient')
plt2 = ax2.imshow(gr[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Vertical Gradient')
plt3 = ax3.imshow(gc[1,:,:])
plt.colorbar(plt3, ax=ax3)
ax3.set_title('Horizontal Gradient')
ax4.imshow(lowres[1,:,:],cmap='gray')
<matplotlib.image.AxesImage at 0x7fb662bf8e80>
Wikipedia has good summaries of optical flow and the Lucas-Kanade algorithm:
We will calculate optical flow in
importlib.reload(submitted)
help(submitted.lucas_kanade)
Help on function lucas_kanade in module submitted: lucas_kanade(gt, gr, gc, H, W) vr, vc = lucas_kanade(gt, gr, blocksize) gt (TxRxC) - gradient in the time direction gr (TxRxC) - gradient in the vertical direction gc (TxRxC) - gradient in the horizontal direction H (scalar) - height (in rows) of each optical flow block W (scalar) - width (in columns) of each optical flow block Within each HxW block of each frame, you should create: - b vector, of size (H*W,1) - A matrix, of size (H*W,2) - calculate v = pinv(A)*b - assign vr and vc as the two elements of the v vector vr (Txint(R/H)xint(C/W)) - pixel velocity in vertical direction vc (Txint(R/H)xint(C/W)) - pixel velocity in horizontal direction
importlib.reload(submitted)
vr, vc = submitted.lucas_kanade(gt,gr,gc,6,6)
print(vr.shape)
(91, 22, 40)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 7))
plt1 = ax1.imshow(vr[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Vertical Velocity')
plt2 = ax2.imshow(vc[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Horizontal Velocity')
ax3.imshow(lowres[1,:,:],cmap='gray')
<matplotlib.image.AxesImage at 0x7fb672680bb0>
from IPython.display import Video
Video("cat.webm")
The pixel velocity fields look pretty good, except for a few single-pixel outliers. Those can be eliminated by using median filtering.
We will use a 2D median filtering algorithm. This follows exactly the code at https://en.wikipedia.org/wiki/Median_filter#Two-dimensional_median_filter_pseudo_code, except for the edges of the image. The Wikipedia code deals with the edges of the image by simply not computing them. In our case, we'll deal with the edges by computing the median in a window of reduced size. Thus:
importlib.reload(submitted)
help(submitted.medianfilt)
Help on function medianfilt in module submitted: medianfilt(x, H, W) y = medianfilt(x, H, W) Median-filter the video, x, in HxW blocks. x (TxRxC) - a video with T frames, R rows, C columns H (scalar) - the height of median-filtering blocks C (scalar) - the width of median-filtering blocks y (TxRxC) - y[t,r,c] is the median of the pixels x[t,rmin:rmax,cmin:cmax], where rmin = max(0,r-int((H-1)/2)) rmax = min(R,r+int((H-1)/2)+1) cmin = max(0,c-int((W-1)/2)) cmax = min(C,c+int((W-1)/2)+1)
importlib.reload(submitted)
smooth_vr = submitted.medianfilt(vr,3,3)
smooth_vc = submitted.medianfilt(vc,3,3)
print(smooth_vr.shape)
(91, 22, 40)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 7))
plt1 = ax1.imshow(vr[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Vertical Velocity')
plt2 = ax2.imshow(vc[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Horizontal Velocity')
plt3 = ax3.imshow(smooth_vr[1,:,:])
plt.colorbar(plt3, ax=ax3)
ax3.set_title('Vertical Velocity, Median')
plt4 = ax4.imshow(smooth_vc[1,:,:])
plt.colorbar(plt4, ax=ax4)
ax4.set_title('Horizontal Velocity, Median')
Text(0.5, 1.0, 'Horizontal Velocity, Median')
Now, in order to use the velocity fields to synthesize a high-res video, we need to upsample the velocity fields to the same size as the high-res images, i.e., $270\times 480$.
The low-res image is $135\times 240$, which is $2$ times smaller than the high-res image. The Lucas-Kanade algorithm further downsampled by $6\times 6$, so our total upsampling factor is $12\times 12$.
We will use bilinear interpolation (linear in both row and column dimensions) in order to upsample the image. This has two parts: (1) upsample to the desired image size, and then (2) filter in both row and column directions by a linear interpolation kernel.
importlib.reload(submitted)
help(submitted.interpolate)
Help on function interpolate in module submitted: interpolate(x, U) y = interpolate(x, U) Upsample and interpolate an image using bilinear interpolation. x (TxRxC) - a video with T frames, R rows, C columns U (scalar) - upsampling factor y (Tx(U*R)x(U*C)) - interpolated image
importlib.reload(submitted)
highres_vc = submitted.interpolate(smooth_vc, 12)
highres_vr = submitted.interpolate(smooth_vr, 12)
print(highres_vc.shape)
(91, 264, 480)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 7))
plt1 = ax1.imshow(smooth_vr[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Smoothed Vertical Velocity')
plt2 = ax2.imshow(smooth_vc[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Smoothed Horizontal Velocity')
plt3 = ax3.imshow(highres_vr[1,:,:])
plt.colorbar(plt3, ax=ax3)
ax3.set_title('Highres Vertical Velocity')
plt4 = ax4.imshow(highres_vc[1,:,:])
plt.colorbar(plt4, ax=ax4)
ax4.set_title('Highres Horizontal Velocity')
print(highres_vc.shape)
print(highres_vr.shape)
(91, 264, 480) (91, 264, 480)
The upsampled velocity vectors have two remaining problems:
The function scale_velocities
scales and then quantizes the velocity vectors. Use int(np.round(...))
to do the quantization.
importlib.reload(submitted)
help(submitted.scale_velocities)
Help on function scale_velocities in module submitted: scale_velocities(v, U) delta = scale_velocities(v, U) Scale the velocities in v by a factor of U, then quantize them to the nearest integer. v (TxRxC) - T frames, each is an RxC velocity image U (scalar) - an upsampling factor delta (TxRxC) - integers closest to v*U
importlib.reload(submitted)
scaled_vr = submitted.scale_velocities(highres_vr,2)
scaled_vc = submitted.scale_velocities(highres_vc,2)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 7))
plt1 = ax1.imshow(highres_vr[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Highres Vertical Velocity')
plt2 = ax2.imshow(highres_vc[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Highres Horizontal Velocity')
plt3 = ax3.imshow(scaled_vr[1,:,:])
plt.colorbar(plt3, ax=ax3)
ax3.set_title('Scaled Vertical Velocity')
plt4 = ax4.imshow(scaled_vc[1,:,:])
plt.colorbar(plt4, ax=ax4)
ax4.set_title('Scaled Horizontal Velocity')
Text(0.5, 1.0, 'Scaled Horizontal Velocity')
So, now, we already have the high-resolution video highres[t,:,:]
for the frames $t\in\left\{0,30,60,90\right\}$. For all other frames, the high-resolution video is currently zero.
Let's use the velocity vector to fill in the missing frames:
$$\mbox{highres}[t,r,c]= \left\{\begin{array}{ll} \mbox{highres}[t,r,c] & t\in\left\{0,30,60,90\right\}\\ \mbox{highres}[t-1,r,c] & r\ge 264\\ \mbox{highres}[t-1,r-v_r[t-1,r,c],c-v_c[t-1,r,c]] & \mbox{otherwise} \end{array}\right.$$where $v_r[t,r,c]$ and $v_c[t,r,c]$ refer to the ones that have been interpolated, scaled, and quantized.
scaled_vr
and scaled_vc
have shape of (264,480), but highres
has a shape of (270,480), so the last six rows of every image are just copied from the preceding image.
importlib.reload(submitted)
print(scaled_vr.shape)
help(submitted.velocity_fill)
(91, 264, 480) Help on function velocity_fill in module submitted: velocity_fill(x, vr, vc, keep) y = velocity_fill(x, vr, vc, keep) Fill in missing frames by copying samples with a shift given by the velocity vector. x (T,R,C) - a video signal in which most frames are zero vr (T,Ra,Cb) - the vertical velocity field, integer-valued vc (T,Ra,Cb) - the horizontal velocity field, integer-valued Notice that Ra and Cb might be less than R and C. If they are, the remaining samples of y should just be copied from y[t-1,r,c]. keep (array) - a list of frames that should be kept. Every frame not in this list is replaced by samples copied from the preceding frame. y (T,R,C) - a copy of x, with the missing frames filled in.
importlib.reload(submitted)
highres_filled = submitted.velocity_fill(highres, scaled_vr, scaled_vc, [0,30,60,90])
print(highres_filled.shape)
(91, 270, 480)
fig, ((ax1, ax2),(ax3,ax4)) = plt.subplots(2,2,figsize=(14, 12))
ax1.imshow(highres_filled[0,:,:],cmap='gray')
ax1.set_title('Frame 0 (Unchanged)')
ax2.imshow(highres_filled[10,:,:], cmap='gray')
ax2.set_title('Frame 10 (Interpolated)')
ax3.imshow(highres_filled[20,:,:], cmap='gray')
ax3.set_title('Frame 20 (Interpolated)')
ax4.imshow(highres_filled[30,:,:], cmap='gray')
ax4.set_title('Frame 30 (Unchanged)')
Text(0.5, 1.0, 'Frame 30 (Unchanged)')
In this example, you can see that optical flow is working, but not really very well. It stretches out the cat's head to the left, but doesn't start turning it, so the turn in frame 30 is sudden. Apparently our optical flow algorithm is missing the turn.
In a real application, we would try to connect the velocity vectors both forward and backward in time, in order to make a smooth trajectory for every pixel.
If you'd like to watch the resulting video, you will need to first install ffmpeg. Then you can use the following block of code to save the images to the directory generated
, then use the following command to make the video:
ffmpeg -r 30 -i generated/cat%04d.jpg generated.webm
import os, matplotlib.image
os.makedirs('generated',exist_ok=True)
for t in range(91):
filename = 'cat%4.4d.jpg'%(t)
matplotlib.image.imsave(os.path.join('generated',filename),highres_filled[t,:,:],cmap='gray')
If you reached this point in the notebook, then probably your code is working well, but before you run the autograder on the server, you should first run it on your own machine.
You can do that by going to a terminal, and running the following command line:
python grade.py
If you get any error messages, we recommend that you use the provided solutions.hdf5
in order to debug. That can be done as follows:
import h5py
with h5py.File('solutions.hdf5','r') as f:
print(list(f.keys()))
['gc', 'gr', 'gt', 'highres', 'highres_filled', 'highres_vc', 'highres_vr', 'keep', 'lowres', 'scaled_vc', 'scaled_vr', 'smooth_vc', 'smooth_vr', 'smoothed', 'vc', 'vr']
importlib.reload(submitted)
with h5py.File('solutions.hdf5','r') as f:
highres_filled_ref = np.array(f['highres_filled'][:])
highres_ref = np.array(f['highres'][:])
scaled_vr_ref = np.array(f['scaled_vr'][:])
scaled_vc_ref = np.array(f['scaled_vc'][:])
keep_ref = np.array(f['keep'][:])
highres_filled_hyp = submitted.velocity_fill(highres_ref, scaled_vr_ref, scaled_vc_ref, keep_ref)
fig = plt.figure(figsize=(14, 14))
ax = fig.add_subplot(3,2,1)
ax.imshow(highres_filled_ref[10,:,:],cmap='gray')
ax.set_title('Frame 10 (Ref)')
ax = fig.add_subplot(3,2,2)
ax.imshow(highres_filled_hyp[10,:,:], cmap='gray')
ax.set_title('Frame 10 (Hyp)')
ax = fig.add_subplot(3,2,(3,6), projection='3d')
T,R,C = highres_filled_hyp.shape
rows = np.repeat(np.arange(R).reshape((R,1)), C, axis=1)
cols = np.repeat(np.arange(C).reshape((1,C)), R, axis=0)
ax.plot_wireframe(rows,cols,highres_filled_hyp[10,:,:]-highres_filled_hyp[10,:,:])
ax.set_title('Difference between the two')
Text(0.5, 0.92, 'Difference between the two')
You can earn up to 10% extra credit on this MP by finishing the file called extra.py
, and submitting it to the autograder.
When you unpack the file mp2_extra.zip
, it will give you the following files:
extra.py
. tests/test_extra.py
. The extra credit assignment is actually pretty simple this time: given a low-resoluton video, and a small set of high-resolution frames, try to reconstruct the high-resolution video.
The function that the grader will call is this function called animate
:
import extra
importlib.reload(extra)
help(extra.animate)
Help on function animate in module extra: animate(highres_frames, lowres_video) highres_video = animate(highres_frames, lowres_video) Use a low-res video, and a few highres frames, to animate a highres video. highres_frames (shape=(Th,Rh,Ch)) - highres frames (Rh>Rl, Ch>Cl) at low temporal rate (Th<Tl) lowres_video (shape=(Tl,Rl,Cl)) - lowres frames at high temporal rate highres_video (shape=(Tl,Rh,Ch)) - highres video at high temporal rate In order to compute highres_frames, you may use any method you wish. It doesn't need to be related to optical flow.
As noted in the docstring, your answer doesn't need to have anything to do with optical flow; it can be any answer you like. There are some remarkably simple answers that give remarkably good results, so please feel free to be creative (exception: don't just download the ground truth from the web. That won't work well, anyway, because you'll fail the hidden tests).
Solutions for the cat video are provided in extra_solutions.hdf5
. This file contains three objects:
with h5py.File('extra_solutions.hdf5','r') as f:
print(list(f.keys()))
['highres_frames', 'highres_ref', 'lowres_video']
We do not know of any solution to this problem that will exactly reconstruct f['highres_ref']
. Instead, you will receive more or less extra credit points depending on the SNR with which you're able to estimate f['highres_ref']
. SNR is defined in tests/text_extra.py
, which has the same definition as the following line:
importlib.reload(extra)
with h5py.File('extra_solutions.hdf5','r') as f:
highres_video = extra.animate(f['highres_frames'][:], f['lowres_video'][:])
SNR = 10*np.log10(np.sum(np.square(f['highres_ref'][:]))/np.sum(np.square(highres_video-f['highres_ref'][:])))
print('The SNR of this solution is %g dB'%(SNR))
The SNR of this solution is 18.1645 dB
If you want to see where your reconstructed image differs from the original image, you can do it in the following way.
with h5py.File('extra_solutions.hdf5','r') as f:
fig = plt.figure(figsize=(14, 14))
ax = fig.add_subplot(3,2,1)
ax.imshow(f['highres_ref'][10,:,:],cmap='gray')
ax.set_title('Frame 10 (Ref)')
ax = fig.add_subplot(3,2,2)
ax.imshow(highres_video[10,:,:], cmap='gray')
ax.set_title('Frame 10 (Hyp)')
ax = fig.add_subplot(3,2,(3,6), projection='3d')
T,R,C = highres_video.shape
rows = np.repeat(np.arange(R).reshape((R,1)), C, axis=1)
cols = np.repeat(np.arange(C).reshape((1,C)), R, axis=0)
ax.plot_wireframe(rows,cols,highres_video[10,:,:]-f['highres_ref'][10,:,:])
ax.set_title('Difference between the two')