MP 07: Where Is Waldo? - Feature Detection & Learning

Files to submit: submitted.py
Allowed libraries: numpy. No torch or cv2
The template package contains starter code, a synthetic data generator, and autograder-compatible tests. ECE448_MP07_Instructions.ipynb contains step-by-step instructions for completing the Assignment.
Run python grade.py to evaluate code locally.

In this assignment, you will build a computer vision pipeline from scratch. Similar to the well-known puzzle "Where is Waldo?", we will search for specific patterns within an image. For this simplified version, we will focus on identifying blobs and bars rather than Waldo's canonical red-and-white stripes.

You will start by implementing the fundamental mathematics of convolution, apply those operations to detect geometric patterns, explore gradient-based edge detection, and finally "train" a filter to recognize patterns automatically using gradient descent.

1. Fundamentals: Correlation vs. Convolution

When finding a match between a pattern (kernel) and an image, we use two primary operations. While they are similar, the distinction is crucial for formal signal processing.

Understanding "Valid" Padding

In this assignment, we use "Valid" padding. This means:

No Padding: We do not add any extra zeros around the edges of the input image.
Full Overlap: The kernel is only placed at positions where it fully overlaps with the original image.
Dimension Reduction: The output feature map will be smaller than the input image.

The output dimensions are calculated as:

$$ H_{out} = H_{in} - K_h + 1 $$ $$ W_{out} = W_{in} - K_w + 1 $$

Required Functions

`correlation2d(image, kernel)`

Perform 2D cross-correlation using valid padding.
Stride: Always 1.
Slide the kernel, multiply overlapping values, and sum them.

`convolve2d(image, kernel)`

Perform 2D convolution.
Mathematically: You must flip the kernel 180 degrees (both horizontally and vertically) before performing the sliding sum.
Tip: Implement this by flipping the kernel and calling your correlation2d function.

2. Synthetic Pattern Detection

Computers "see" features by scanning for regions with high mathematical response to specific kernels.

`detect_blob(image, k=5)`

Construct a circular kernel of size $k \times k$ (where $k$ is odd).
The kernel should be a binary circle mask (1.0 inside, 0.0 outside).
Normalization: The sum of all weights in the kernel must equal 1.0.
Return the kernel and the coordinates of global maxima.

`detect_bar(image, k=7)`

Construct a vertical bar kernel of size $k \times k$.
The kernel should have a vertical line of 1.0s in the center column.
Normalization: The sum of all weights in the kernel must equal 1.0.
Return the kernel and the detected positions of global maxima.

3. Gradient-Based Edge Detection

Gradient filters measure the rate of change in pixel intensities to find edges.

`detect_bar_gradient(image, scale=2)`

Implement a 3x3 horizontal gradient filter. Two most popular in this cathegory are Sobel and Prewitt. The main difference between these filters is the weight assigned to the central row or column.

When scale=1, this is a Prewitt filter.
When scale=2, this is a Sobel filter.
Return the kernel and the detected coordinates of the highest gradient.

4. Learning Filters for Recognition

Modern AI doesn't rely on human-designed filters. Instead, kernels are "learned" from data. You will implement a simplified training loop to find a $3 \times 3$ kernel that can distinguish a Blob (Label 1) from a Bar (Label 0).

Stochastic Gradient Descent (SGD)

SGD is an optimization algorithm that iteratively updates model parameters (weights) by calculating the gradient of the loss function for a single training example at a time.

Implementation: `train_classifier`

Initialize: Start with a $3 \times 3$ kernel of small random values.
Forward Pass: Convolve the image and find the peak response (max).
Loss: Use Mean Squared Error (MSE): $ L = (max - y)^2 $.
Backpropagation: Calculate the gradient of the loss with respect to the kernel weights.
Update: $ K = K - (lr \times gradient) $.

Critical Implementation Detail:
Since convolve2d flips the kernel 180° in the forward pass, your backpropagation must account for this.

Multi-Max Handling: If multiple locations share the exact same maximum response, calculate the gradient for ALL of them and average the result.