mp_sketching
Supreme sketching
MM Class Reference

The MinHash Matrix (MM) class converts PNGs into an indexable matrix of MinHash sketches It is built using BWPixels already implemented for you (treat pixels as luminance). More...

#include <sketching.h>

Public Member Functions

 MM (const cs225::PNG &input, unsigned numTiles, unsigned k, hashFunction h)
 Constructor converts a PNG into a collection of MinHash sketches. More...
 
std::vector< uint64_t > getMinHash (unsigned width, unsigned height) const
 Accessor function that returns the MinHash sketch for a specific tile using (width, height) coordinates. More...
 
int countMatchTiles (const MM &other, float threshold) const
 Comparison function that returns the number of paired tiles that are above a provided threshold of similarity. More...
 

Detailed Description

The MinHash Matrix (MM) class converts PNGs into an indexable matrix of MinHash sketches It is built using BWPixels already implemented for you (treat pixels as luminance).

For this implementation, use ONLY the bottom-k MinHash approach.

Constructor & Destructor Documentation

◆ MM()

MM::MM ( const cs225::PNG input,
unsigned  numTiles,
unsigned  k,
hashFunction  h 
)

Constructor converts a PNG into a collection of MinHash sketches.

It is up to you to determine how your collection is stored. The autograder will only check that every MinHash sketch (when accessed) is correct

Parameters
inputThe PNG image to be sketched
numTilesA vector containing a list of hash functions to use in the BF
kAn unsigned integer describing the number of minima that need to be tracked (and returned)
hAn hash function that takes in an integer and returns an uint64_t

Member Function Documentation

◆ countMatchTiles()

int MM::countMatchTiles ( const MM other,
float  threshold 
) const

Comparison function that returns the number of paired tiles that are above a provided threshold of similarity.

Each tile should be compared using MinHash Jaccard with its direct opposite (matching width, height in tile space) You may assume that this function is only run on MM that have the same total numTiles.

So I have two MM each with four tiles, this will output a number from 0 to 4 corresponding to how many coordinate matching pairs are at or above the threshold.

Given:

A | B E | F


C | D G | H

Return, with (X ~ Y) implying a MinHash Jaccard comparison above threshold:

(A ~ E) + (B ~ F) + (C ~ G) + (D ~ H)

OBSERVE: One of the powerful things about this approach is the scale of the image doesnt matter! The number of pixels in each tile is lost when the transformation occurs so theoretically the same image resized will have a near perfect or identical match.

Parameters
otherThe MinHash Matrix (with the same number of tiles) being compared against
thresholdA float value between 0 and 1. A 'match' between tiles must be equal to or greater than this value.

◆ getMinHash()

std::vector< uint64_t > MM::getMinHash ( unsigned  width,
unsigned  height 
) const

Accessor function that returns the MinHash sketch for a specific tile using (width, height) coordinates.

REMEMBER: (0, 0) is the TOP LEFT coordinate in a PNG. Width increases from left to right. Height increases from top to bottom.

The tests here will require that both your constructor and your accessor are correct. You are encouraged to test them separately by creating your own tests based on your implementation strategy.

Parameters
widthAn unsigned integer corresponding to the x (or width) dimension
heightAn unsigned integer corresponding to the y (or height) dimension

The documentation for this class was generated from the following file: