MP 07: Where Is Waldo? - Feature Detection & Learning

Files to submit: submitted.py
Allowed libraries: numpy. No torch or cv2
The template package contains starter code, a synthetic data generator, and autograder-compatible tests. ECE448_MP07_Instructions.ipynb contains step-by-step instructions for completing the Assignment.
Run python grade.py to evaluate code locally.

In this assignment, you will build a computer vision pipeline from scratch. Similar to the well-known puzzle "Where is Waldo?", we will search for specific patterns within an image. For this simplified version, we will focus on identifying blobs and bars rather than Waldo's canonical red-and-white stripes.

You will start by implementing the fundamental mathematics of convolution, apply those operations to detect geometric patterns, explore gradient-based edge detection, and finally "train" a filter to recognize patterns automatically using gradient descent.

1. Fundamentals: Correlation vs. Convolution

When finding a match between a pattern (kernel) and an image, we use two primary operations. While they are similar, the distinction is crucial for formal signal processing.

Understanding "Valid" Padding

In this assignment, we use "Valid" padding. This means:

The output dimensions are calculated as:

$$ H_{out} = H_{in} - K_h + 1 $$ $$ W_{out} = W_{in} - K_w + 1 $$

Required Functions

correlation2d(image, kernel)

convolve2d(image, kernel)

2. Synthetic Pattern Detection

Computers "see" features by scanning for regions with high mathematical response to specific kernels.

detect_blob(image, k=5)

detect_bar(image, k=7)

3. Gradient-Based Edge Detection

Gradient filters measure the rate of change in pixel intensities to find edges.

detect_bar_gradient(image, scale=2)

Implement a 3x3 horizontal gradient filter. Two most popular in this cathegory are Sobel and Prewitt. The main difference between these filters is the weight assigned to the central row or column.

4. Learning Filters for Recognition

Modern AI doesn't rely on human-designed filters. Instead, kernels are "learned" from data. You will implement a simplified training loop to find a \(3 \times 3\) kernel that can distinguish a Blob (Label 1) from a Bar (Label 0).

Stochastic Gradient Descent (SGD)

SGD is an optimization algorithm that iteratively updates model parameters (weights) by calculating the gradient of the loss function for a single training example at a time.

Implementation: train_classifier

  1. Initialize: Start with a \(3 \times 3\) kernel of small random values.
  2. Forward Pass: Convolve the image and find the peak response (max).
  3. Loss: Use Mean Squared Error (MSE): \( L = (max - y)^2 \).
  4. Backpropagation: Calculate the gradient of the loss with respect to the kernel weights.
  5. Update: \( K = K - (lr \times gradient) \).
Critical Implementation Detail:
Since convolve2d flips the kernel 180° in the forward pass, your backpropagation must account for this.

Multi-Max Handling: If multiple locations share the exact same maximum response, calculate the gradient for ALL of them and average the result.