mp_sketching
Supreme sketching
|
The MinHash Matrix (MM) class converts PNGs into an indexable matrix of MinHash sketches It is built using BWPixels already implemented for you (treat pixels as luminance). More...
#include <sketching.h>
Public Member Functions | |
MM (const cs225::PNG &input, unsigned numTiles, unsigned k, hashFunction h) | |
Constructor converts a PNG into a collection of MinHash sketches. More... | |
std::vector< uint64_t > | getMinHash (unsigned width, unsigned height) const |
Accessor function that returns the MinHash sketch for a specific tile using (width, height) coordinates. More... | |
int | countMatchTiles (const MM &other, float threshold) const |
Comparison function that returns the number of paired tiles that are above a provided threshold of similarity. More... | |
The MinHash Matrix (MM) class converts PNGs into an indexable matrix of MinHash sketches It is built using BWPixels already implemented for you (treat pixels as luminance).
For this implementation, use ONLY the bottom-k MinHash approach.
MM::MM | ( | const cs225::PNG & | input, |
unsigned | numTiles, | ||
unsigned | k, | ||
hashFunction | h | ||
) |
Constructor converts a PNG into a collection of MinHash sketches.
It is up to you to determine how your collection is stored. The autograder will only check that every MinHash sketch (when accessed) is correct
input | The PNG image to be sketched |
numTiles | A vector containing a list of hash functions to use in the BF |
k | An unsigned integer describing the number of minima that need to be tracked (and returned) |
h | An hash function that takes in an integer and returns an uint64_t |
int MM::countMatchTiles | ( | const MM & | other, |
float | threshold | ||
) | const |
Comparison function that returns the number of paired tiles that are above a provided threshold of similarity.
Each tile should be compared using MinHash Jaccard with its direct opposite (matching width, height in tile space) You may assume that this function is only run on MM that have the same total numTiles.
So I have two MM each with four tiles, this will output a number from 0 to 4 corresponding to how many coordinate matching pairs are at or above the threshold.
Given:
A | B E | F
C | D G | H
Return, with (X ~ Y) implying a MinHash Jaccard comparison above threshold:
(A ~ E) + (B ~ F) + (C ~ G) + (D ~ H)
OBSERVE: One of the powerful things about this approach is the scale of the image doesnt matter! The number of pixels in each tile is lost when the transformation occurs so theoretically the same image resized will have a near perfect or identical match.
other | The MinHash Matrix (with the same number of tiles) being compared against |
threshold | A float value between 0 and 1. A 'match' between tiles must be equal to or greater than this value. |
std::vector< uint64_t > MM::getMinHash | ( | unsigned | width, |
unsigned | height | ||
) | const |
Accessor function that returns the MinHash sketch for a specific tile using (width, height) coordinates.
REMEMBER: (0, 0) is the TOP LEFT coordinate in a PNG. Width increases from left to right. Height increases from top to bottom.
The tests here will require that both your constructor and your accessor are correct. You are encouraged to test them separately by creating your own tests based on your implementation strategy.
width | An unsigned integer corresponding to the x (or width) dimension |
height | An unsigned integer corresponding to the y (or height) dimension |