In this lab, you'll use the Viola-Jones algorithm to detect faces in images.
In order to make sure everything works, you might want to go to the command line, and run
pip install -r requirements.txt
This will install the modules that are used on the autograder, including numpy, h5py, and the gradescope utilities.
The data this time is provided in a set of image files, and in a JSON file called data.json
. This file describes 8 rectangles per image.
class 0
.class 1
.import numpy as np
import json
with open('data.json') as f:
data = json.load(f)
print('There are',len(data['test']),'testing data')
print('There are',len(data['train']),'training data')
print('The first one looks like this:\n',data['train'][0])
There are 32 testing data There are 124 training data The first one looks like this: {'filename': 'data/AF1_35D12460.jpg', 'rectangles': [[[54, 10, 151, 167], [459, 12, 159, 176], [55, 258, 151, 178], [424, 262, 168, 169]], [[165, 110, 88, 97], [38, 306, 83, 66], [528, 32, 184, 114], [545, 312, 135, 81]]]}
A rectangle
is a 4-list, [x,y,width,height]
. To make it easier to plot a rectangle on an axis, we can define a function:
def plotrect(rect,ax,color):
xcoords = rect[0] + np.array([0,0,1,1,0])*rect[2]
ycoords = rect[1] + np.array([0,1,1,0,0])*rect[3]
ax.plot(xcoords,ycoords,color)
The JPG images are all in color, but we won't use the colors for this MP, so we'll use Pillow's convert('L') function to convert each image to grayscale as it is loaded.
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
img = np.array(Image.open(data['train'][0]['filename']).convert('L'))
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax.imshow(img,cmap='gray')
for y,color in enumerate(['c--','m-']):
for r in range(4):
plotrect(data['train'][0]['rectangles'][y][r],ax,color)
The Viola-Jones face detector is computationally efficient because it takes advantage of an intermediate representation called an integral image
. If the grayscale image is $I[y,x]$, then the integral image is
In order to make the rest of project computationally efficient, the first function you should write is submitted.integral_images
:
import importlib, submitted
importlib.reload(submitted)
help(submitted.integral_images)
Before we run integral_images
, let's read in all of the training image files, and convert them all into a single grayscale numpy array called images
:
npix = len(data['train'])
m,n = img.shape
images = np.empty((npix,m,n))
for i in range(len(data['train'])):
filename = data['train'][i]['filename']
images[i,:,:] = np.array(Image.open(filename).convert('L'))
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax.imshow(images[0,:,:],cmap='gray')
for y,color in enumerate(['c--','m-']):
for r in range(4):
plotrect(data['train'][0]['rectangles'][y][r],ax,color)
Now we can try running integral_images
:
importlib.reload(submitted)
ii = submitted.integral_images(images)
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax.imshow(ii[0,:,:],cmap='gray')
for y,color in enumerate(['c--','m-']):
for r in range(4):
plotrect(data['train'][0]['rectangles'][y][r],ax,color)
To make sure your code is working correctly, check that:
You can also check your answer against my answer, for the first 10 rows and columns or so:
print('Integral image has shape',ii.shape)
print('and its first 10x10 subimage is:\n')
print(ii[0,:10,:10])
Integral image has shape (124, 481, 721) and its first 10x10 subimage is: [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 9. 17. 26. 38. 63. 131. 274. 481. 711.] [ 0. 14. 28. 47. 76. 136. 287. 592. 1027. 1488.] [ 0. 15. 30. 54. 95. 185. 414. 876. 1534. 2226.] [ 0. 18. 35. 61. 105. 209. 494. 1086. 1938. 2862.] [ 0. 30. 56. 87. 133. 246. 578. 1288. 2319. 3475.] [ 0. 38. 70. 105. 154. 279. 663. 1498. 2716. 4105.] [ 0. 39. 72. 108. 160. 302. 747. 1718. 3137. 4759.] [ 0. 40. 74. 112. 171. 335. 848. 1964. 3592. 5447.] [ 0. 43. 79. 119. 182. 362. 933. 2182. 4007. 6097.]]
The Viola-Jones detector uses Adaboost to learn a kind of two-layer neural network classifier. The input of the classifier is a selected vector of Haar-like features, which are simple convolutions that be computed very efficiently, for the entire dataset, from the stack of integral images.
Basically, a Haar-like feature is the sum of all pixel intensities in a rectangle, or the difference of several such features. If $I[y,x]$ is an image, $r=[r_1,r_2,r_3,r_4]$ is a rectangle, $\phi=[\phi_1,\phi_2,\phi_3,\phi_4,o_1,o_2]$ are fractions ($0\le \phi_1,\phi_2,\phi_3,\phi_4\le 1$), and filter orders ($1\le o_1,1\le o_2,o_1+o_2\le 4$), then the corresponding Haar-like feature is defined as
$$f=\sum_{y=\phi_2r_4}^{(\phi_2+\phi_4)r_4-1}\sum_{x=\phi_1r_3}^{(\phi_1+\phi_3)r_3-1}I[r_2+y,r_1+x]\left(-1\right)^{\text{int}\left(o_2\frac{y-\phi_2r_4}{\phi_4r_4}\right)}\left(-1\right)^{\text{int}\left(o_1\frac{x-\phi_1r_3}{\phi_3r_3}\right)}$$The definition is not very obvious from that equation. It's easier to understand it if I show you a picture (CC-SA-3.0). In the following figure, the sum of the pixels which lie within the white blocks is subtracted from the sum of the pixels which lie within the black blocks. Subfigure (1) has order $o_1=2,o_2=1$, subfigure (2) has $o_1=1,o_2=2$, subfigure (3) has $o_1=3,o_2=1$, and subfigure (4) has $o_1=2,o_2=2$:
In order to make the homework gradeable, let's set a standard that matches the equation but is slightly different from this image: let's say that the upper left block (block $(0,0)$) is always multiplied by +1, and other blocks are multiplied by +1 or -1, respectively.
If a Haar-like feature is computed from the original image, each feature requires ${\mathcal O}\left\{r_3r_4\right\}$ summations to compute. If we're computing too many features, that becomes a lot of computation --- remember that $r_3$ and $r_4$ are of the same order as the size of the image, i.e., hundreds of pixels.
Viola and Jones showed that the same features can be very efficiently computed from the integral image. For example, the summation of all pixels within the block shown below can be computed as $J(A)+J(C)-J(B)-J(D)$ in the following image, where $J(\cdot)$ is the integral image at point $\cdot$ (Public Domain image):
If we use $b[k,l]$ to denote the $(k,l)^{\text{th}}$ block within a subrectangle, then the total feature can be computed as:
$$s_1=r_1+\phi_1r_3$$$$s_2=r_2+\phi_2r_4$$$$s_3=\phi_3r_3/o_1$$$$s_4=\phi_4r_4/o_2$$$$b[k,l]=J\left[\lfloor s_2+ls_4\rfloor,\lfloor s_1+ks_3\rfloor\right]+ J\left[\lfloor s_2+(l+1)s_4\rfloor,\lfloor s_1+(k+1)s_3\rfloor\right]- J\left[\lfloor s_2+ls_4\rfloor,\lfloor s_1+(k+1)s_3\rfloor\right]- J\left[\lfloor s_2+(l+1)s_4\rfloor,\lfloor s_1+ks_3\rfloor\right]$$$$f=\sum_{k=0}^{o_x-1}\sum_{l=0}^{o_y-1} \left(-1\right)^{k+l}b[k,l]$$In order to make Adaboost computationally efficient, you will want to compute a single feature (for a single specified value of the parameters ($\phi=[\phi_1,\phi_2,\phi_3,\phi_4,o_1,o_2]$) for all of the rectangles in all of the images, all at once, as specified in the docstring for haarlike_features
:
importlib.reload(submitted)
help(submitted.haarlike_features)
Help on function haarlike_features in module submitted: haarlike_features(ii, rects, xfrac, yfrac, wfrac, hfrac, xorder, yorder) Compute Haar-like features with given params for all rects. Feature is computed as follows: (1) In each rectangle, find the subrectangle that starts at pixel (y+yfrac*h,x+xfrac*w), and ends at pixel (y+(yfrac+hfrac)*h-1,x+(xfrac+wfrac)*w-1). (2) Divide the subrectangle into (yorder by xorder) blocks. (3) Add the pixels in positive blocks, and subtract the pixels in negative blocks, where the top-left block is always positive. (But you shouldn't actually add pixels. Instead, use the integral image (ii) in a smart way). @param: ii (npix,m+1,n+1): a stack of npix integral images rects (2,npix,4,4): for each class, for each image, 4 rects xfrac (real): subrectangle start (fraction of rects[:,:,:,2]) yfrac (real): subrectangle start (fraction of rects[:,:,:,3]) wfrac (real): subrectangle width (fraction of rects[:,:,:,2]) hfrac (real): subrectangle height (fraction of rects[:,:,:,3]) xorder (int): subrectangle contains this many blocks across yorder (int): subrectangle contains this many blocks down @return: features (2,npix,4): features[y,i,r] is feature for class y, for the i'th image, for the r'th rectangle in the image
First, let's debug by using the top $6\times 6$ rectangle in imagefiles[0]
.
print(ii[0,:7,:7])
[[ 0. 0. 0. 0. 0. 0. 0.] [ 0. 9. 17. 26. 38. 63. 131.] [ 0. 14. 28. 47. 76. 136. 287.] [ 0. 15. 30. 54. 95. 185. 414.] [ 0. 18. 35. 61. 105. 209. 494.] [ 0. 30. 56. 87. 133. 246. 578.] [ 0. 38. 70. 105. 154. 279. 663.]]
We'll use np.tile to create 8 copies of that rectangle for every image in the dataset:
dummyrects = np.tile(np.array([0,0,6,6]),(2,len(images),4,1))
print('dummyrects has shape',dummyrects.shape)
dummyrects has shape (2, 124, 4, 4)
First, let's use $(\phi_1,\phi_2,\phi_3,\phi_4)$ to compute the order $(o_1=1,o_2=1)$ features for all of the various sub-blocks:
importlib.reload(submitted)
allblocks=[(0,1),(0,0.5),(0.5,0.5)]
for (xfrac,wfrac) in allblocks:
for (yfrac,hfrac) in allblocks:
f = submitted.haarlike_features(ii,dummyrects,xfrac,yfrac,wfrac,hfrac,1,1)
print('(%g,%g,%g,%g): feature=%g'%(xfrac,yfrac,wfrac,hfrac,f[0,0,0]))
(0,0,1,1): feature=663 (0,0,1,0.5): feature=414 (0,0.5,1,0.5): feature=249 (0,0,0.5,1): feature=105 (0,0,0.5,0.5): feature=54 (0,0.5,0.5,0.5): feature=51 (0.5,0,0.5,1): feature=558 (0.5,0,0.5,0.5): feature=360 (0.5,0.5,0.5,0.5): feature=198
Now let's compute the feature for every possible feature order, and check by hand to make sure it makes sense:
importlib.reload(submitted)
for xorder in range(1,4):
for yorder in range(1,5-xorder):
features=submitted.haarlike_features(ii,dummyrects,0,0,1,1,xorder,yorder)
print('xorder=%d,yorder=%d,feature=%d'%(xorder,yorder,features[0,0,0]))
xorder=1,yorder=1,feature=663 xorder=1,yorder=2,feature=165 xorder=1,yorder=3,feature=249 xorder=2,yorder=1,feature=-453 xorder=2,yorder=2,feature=-159 xorder=3,yorder=1,feature=495
Let's convert data['train'][:]['rectangles']
to an array called rects
, and test to make sure we can compute a feature from it.
npix = ii.shape[0]
rects = np.empty((2,npix,4,4))
for i in range(npix):
for y in range(2):
for r in range(4):
rects[y,i,r,:]=np.array(data['train'][i]['rectangles'][y][r])
importlib.reload(submitted)
features=submitted.haarlike_features(ii,rects,0,0,1,1,1,1)
print('features have shape',features.shape)
print('features of class 0 first image:',features[0,0,:])
print('features of class 1 first image:',features[1,0,:])
features have shape (2, 124, 4) features of class 0 first image: [2859162. 3622397. 3276614. 4199616.] features of class 1 first image: [ 333009. 896903. 3715836. 2166603.]
Adaboost creates a strong classifier based on the weighted voting of a large number of weak classifiers. The $j^{\text{th}}$ weak classifier is defined as:
$$h_j(x)= \left\{\begin{array}{ll}1&p_jf_j(x)<p_j\theta_j\\0&\mbox{otherwise}\end{array}\right.,$$where $p_j\in\{-1,+1\}$ is a sign, and $\theta_j\in\Re$ is a threshold.
In Adaboost, each training token (each tuple of $x=(y,i,r)$ where $y$ is a class label, $i$ is an image index, and $r$ is a rectangle) has a weight $w_j(x)$, and the weights all add to 1. The error rate can be computed by just adding up the weights of the misclassified tokens. Thus the error rate of the $j^{\text{th}}$ weak classifier is:
$$\epsilon_j=\sum_x w_j(x)\vert y(x)-h_j(x)\vert$$If the sign is positive ($p_j=+1$), then
$$\epsilon_j= \sum_{f_j(x)=-\infty}^{<\theta_j} w_j(x)(1-y(x))+ \sum_{f_j(x)=\theta_j}^{\infty} w_j(x)y(x) $$If the sign is negative ($p_j=-1$), then
$$\epsilon_j= \sum_{f_j(x)=-\infty}^{\theta_j} w_j(x)y(x)+ \sum_{f_j(x)>\theta_j}^{\infty} w_j(x)(1-y(x)) $$Notice that the optimum value of $\theta_j$ is always equal to the feature value of one of the training tokens. We can find the optimum threshold in ${\mathcal O}\left\{\text{npix}\right\}$ time, therefore, as follows:
importlib.reload(submitted)
help(submitted.weak_classifier)
Help on function weak_classifier in module submitted: weak_classifier(features, weights, ind) Find the best weak classifier for a set of features. The weak classifier is: h(a,b) = 1 if sign*f < sign*threshold, otherwise h(a,b)=0. @param: features (2,npix,4): features features[0,:,:] are the features of class 0 (faces) features[1,:,:] are the features of class 1 (nonfaces) weights (2,npix,4): importance weight of each feature such that np.sum(weights)=1 ind (tuple): the result of running np.unravel_index(np.argsort(features,axis=None),features.shape) ind[0][k] is the class label, ind[1][k] is the image index, ind[2][k] is the rectangle index of the k'th smallest feature. features[ind] gives features sorted in ascending order, weights[ind] gives weights sorted in the same order. @return: epsilon (2,8*npix): epsilon[0,k]=error if sign=-1 and threshold=features[ind][k] epsilon[1,k]=error if sign=1 and threshold=features[ind][k]
First, let's test np.argsort
, to see how it sorts the features.
ind=np.unravel_index(np.argsort(features,axis=None),features.shape)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2,1,figsize=(14,12))
ax[0].plot(features[ind])
ax[0].set_title('Sorted features',fontsize=18)
ax[1].plot(ind[0])
ax[1].set_title('Class labels of sorted features',fontsize=18)
fig.tight_layout()
We can see that the features are, indeed, sorted. We can see also that the small-value features tend to be of class 1 (nonface), and the high-value features tend to be of class 0 (face). Remember that this is feature $(0,0,1,1,1,1)$, i.e., the sum of all pixels in the rectangle. Apparently, faces tend to have larger pixel-sums than nonfaces. There are two possible reasons for this:
Anyway, let's go ahead and run weak_classifier
to find out the training error of this classifier. We'll start out by giving equal weight to every training token.
weights = np.ones(features.shape)/np.prod(features.shape)
importlib.reload(submitted)
epsilon=submitted.weak_classifier(features,weights,ind)
print('p=-1, theta=%d, eps=%5.5g'%(features[ind][0],epsilon[0,0]))
print('p=+1, theta=%d, eps=%5.5g'%(features[ind][0],epsilon[1,0]))
print('p=-1, theta=%d, eps=%5.5g'%(features[ind][-1],epsilon[0,-1]))
print('p=+1, theta=%d, eps=%5.5g'%(features[ind][-1],epsilon[1,-1]))
p=-1, theta=28583, eps=0.50101 p=+1, theta=28583, eps= 0.5 p=-1, theta=7801254, eps= 0.5 p=+1, theta=7801254, eps=0.50101
Notice that:
The opposite reasoning holds true for the largest threshold.
Now let's plot the error as a function of sort-index.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1,figsize=(14,7))
line0,=ax.plot(epsilon[0,:])
line1,=ax.plot(epsilon[1,:])
ax.legend([line0,line1],['sign=-1','sign=+1'],fontsize=18)
fig.tight_layout()
The +1 classifier achieves pretty good error, because it classifies small features as $h=1$. Error of the -1 classifier is pretty bad, because it classifies small features as $h=0$. Let's find the best threshold:
p,k = np.unravel_index(np.argmin(epsilon,axis=None),epsilon.shape)
theta = features[ind][k]
print('Best sign is',p,'and best threshold is',theta)
Best sign is 1 and best threshold is 1743865.0
AdaBoost's strong classifier is a weighted vote among many weak classifiers. The weak classifiers are chosen one by one, so that each weak classifier can be targeted to try to fix the errors made by the previous weak classifier. Specifically, AdaBoost training repeats the following steps:
The result is a sequence of weak classifiers, $[h_1,\ldots,h_T]$, and their corresponding weights $[\beta_1,\ldots,\beta_T]$, where $h_t=[\phi_{t},p_t,\theta_t]$ is a set of feature parameters ($\phi_t=[\phi_{1,t},\ldots,\phi_{4,t},o_{1,t},o_{2,t}]$), and the corresponding optimum sign, $p_t$, and threshold, $\theta_t$.
importlib.reload(submitted)
help(submitted.train_adaboost)
Help on function train_adaboost in module submitted: train_adaboost(ii, rects, weights) Perform one iteration of adaboost training. 1. Normalize weights so they sum to one 2. Find the best weak classifier, computed by exhaustive search among all classifiers in the set: xfrac in {0,1/6,...,5/6} yfrac in {0,1/6,...,5/6} wfrac in {0,1/6,...,1-xfrac} hfrac in {0,1/6,...,1-yfrac} xorder in {1,...,3} yorder in {1,...,4-yorder} sign in {-1,+1} threshold in haarlike_features(ii,rects,xfrac,...,yorder) 3. Downweight correctly classified tokens by beta @param: ii (npix,m+1,n+1): a stack of npix integral images rects (2,npix,4,4): for each class, for each image, 4 rects weights (2,npix,4): importance weight of each feature. May not sum to 1. @return: h (list): xfrac,yfrac,wfrac,hfrac,xorder,yorder,sign,threshold of the best weak classifier beta (real): eps/(1-eps), where eps is the weighted error newweights (2,npix,4): new weights, with weights of correctly classified tokens downweighted by beta. Need not sum to one.
The only way to find the best weak classifier is with an exhaustive search over every possible combination of parameters. That may take a while, so, if you wish, you can have your function print status updates every now and then. I recommend making this conditional on the parameter verbose
, because you may not want it to be verbose during the longer training run that comes next. Mine prints a status update for each new combination of xfrac
and wfrac
, if verbose==True
.
Notice that, since the first step of train_adaboost
is to normalize the weights, we can just initialize it by setting weights to an all-ones matrix.
importlib.reload(submitted)
weights = np.ones((2,ii.shape[0],4))
h,beta,newweights=submitted.train_adaboost(ii,rects,weights,True)
print('best classifier is',h,', and beta is',beta)
Searching xfrac=0/6, wfrac=1/6 Searching xfrac=0/6, wfrac=2/6 Searching xfrac=0/6, wfrac=3/6 Searching xfrac=0/6, wfrac=4/6 Searching xfrac=0/6, wfrac=5/6 Searching xfrac=0/6, wfrac=6/6 Searching xfrac=1/6, wfrac=1/6 Searching xfrac=1/6, wfrac=2/6 Searching xfrac=1/6, wfrac=3/6 Searching xfrac=1/6, wfrac=4/6 Searching xfrac=1/6, wfrac=5/6 Searching xfrac=2/6, wfrac=1/6 Searching xfrac=2/6, wfrac=2/6 Searching xfrac=2/6, wfrac=3/6 Searching xfrac=2/6, wfrac=4/6 Searching xfrac=3/6, wfrac=1/6 Searching xfrac=3/6, wfrac=2/6 Searching xfrac=3/6, wfrac=3/6 Searching xfrac=4/6, wfrac=1/6 Searching xfrac=4/6, wfrac=2/6 Searching xfrac=5/6, wfrac=1/6 best classifier is [0.0, 0.0, 1.0, 0.6666666666666666, 1, 2, -1, -111920.0] , and beta is 0.1481481481481506
Now let's make sure that some of the weights were changed. The newweights
should have two values now: one value for tokens that were incorrectly classified (about $\epsilon=\beta/(1+\beta)=0.13$ of the tokens), and a lower value for tokens that were correctly classified (the remaining 87%). The weights don't need to sum to one.
fig, ax = plt.subplots(1,1,figsize=(14,4))
line0,=ax.plot(newweights[0,:,:].flatten())
line1,=ax.plot(newweights[1,:,:].flatten())
ax.legend([line0,line1],['class 0','class 1'],fontsize=18)
fig.tight_layout()
print('Sum of newweights is',np.sum(newweights))
Sum of newweights is 0.25806451612903447
It looks like most of the tokens were downweighted, because they were correctly classified.
Now that train_adaboost
seems to be working, let's try running it 40 times, to get 40 weak classifiers. Each weak classifier will focus its attention on tokens that were misclassified by previous weak classifiers, so that, over time, a weighted vote of these classifiers gets stronger and stronger.
Warning: this will take a lot of time.
import json
weights = np.ones((2,ii.shape[0],4))
betalist = []
hlist = []
for t in range(40):
h,beta,newweights=submitted.train_adaboost(ii,rects,weights)
print('Finished iteration',t)
print('Its error rate was %4.4g'%(beta/(1+beta)))
print('newweights sum to %4.4g'%(np.sum(newweights)))
hlist.append([float(x) for x in h])
betalist.append(float(beta))
with open('trained_model.json','w') as f:
json.dump({'hlist':hlist,'betalist':betalist},f)
print('Saved %d weak classifiers'%(len(hlist)))
weights = newweights
Finished iteration 0 Its error rate was 0.129 newweights sum to 0.2581 Saved 1 weak classifiers Finished iteration 1 Its error rate was 0.1959 newweights sum to 0.3918 Saved 2 weak classifiers Finished iteration 2 Its error rate was 0.215 newweights sum to 0.4301 Saved 3 weak classifiers Finished iteration 3 Its error rate was 0.2211 newweights sum to 0.4422 Saved 4 weak classifiers Finished iteration 4 Its error rate was 0.2429 newweights sum to 0.4857 Saved 5 weak classifiers Finished iteration 5 Its error rate was 0.2404 newweights sum to 0.4808 Saved 6 weak classifiers Finished iteration 6 Its error rate was 0.2913 newweights sum to 0.5825 Saved 7 weak classifiers Finished iteration 7 Its error rate was 0.2728 newweights sum to 0.5457 Saved 8 weak classifiers Finished iteration 8 Its error rate was 0.2924 newweights sum to 0.5847 Saved 9 weak classifiers Finished iteration 9 Its error rate was 0.2969 newweights sum to 0.5938 Saved 10 weak classifiers Finished iteration 10 Its error rate was 0.3031 newweights sum to 0.6063 Saved 11 weak classifiers Finished iteration 11 Its error rate was 0.2824 newweights sum to 0.5647 Saved 12 weak classifiers Finished iteration 12 Its error rate was 0.3034 newweights sum to 0.6067 Saved 13 weak classifiers Finished iteration 13 Its error rate was 0.3267 newweights sum to 0.6535 Saved 14 weak classifiers Finished iteration 14 Its error rate was 0.3177 newweights sum to 0.6354 Saved 15 weak classifiers Finished iteration 15 Its error rate was 0.3219 newweights sum to 0.6438 Saved 16 weak classifiers Finished iteration 16 Its error rate was 0.3041 newweights sum to 0.6082 Saved 17 weak classifiers Finished iteration 17 Its error rate was 0.2939 newweights sum to 0.5878 Saved 18 weak classifiers Finished iteration 18 Its error rate was 0.2861 newweights sum to 0.5722 Saved 19 weak classifiers Finished iteration 19 Its error rate was 0.2997 newweights sum to 0.5995 Saved 20 weak classifiers Finished iteration 20 Its error rate was 0.3206 newweights sum to 0.6413 Saved 21 weak classifiers Finished iteration 21 Its error rate was 0.3276 newweights sum to 0.6553 Saved 22 weak classifiers Finished iteration 22 Its error rate was 0.2937 newweights sum to 0.5874 Saved 23 weak classifiers Finished iteration 23 Its error rate was 0.3194 newweights sum to 0.6388 Saved 24 weak classifiers Finished iteration 24 Its error rate was 0.3067 newweights sum to 0.6134 Saved 25 weak classifiers Finished iteration 25 Its error rate was 0.3277 newweights sum to 0.6554 Saved 26 weak classifiers Finished iteration 26 Its error rate was 0.2974 newweights sum to 0.5949 Saved 27 weak classifiers Finished iteration 27 Its error rate was 0.3253 newweights sum to 0.6506 Saved 28 weak classifiers Finished iteration 28 Its error rate was 0.3286 newweights sum to 0.6572 Saved 29 weak classifiers Finished iteration 29 Its error rate was 0.337 newweights sum to 0.674 Saved 30 weak classifiers Finished iteration 30 Its error rate was 0.3161 newweights sum to 0.6322 Saved 31 weak classifiers Finished iteration 31 Its error rate was 0.306 newweights sum to 0.612 Saved 32 weak classifiers Finished iteration 32 Its error rate was 0.3045 newweights sum to 0.609 Saved 33 weak classifiers Finished iteration 33 Its error rate was 0.3122 newweights sum to 0.6243 Saved 34 weak classifiers Finished iteration 34 Its error rate was 0.3346 newweights sum to 0.6691 Saved 35 weak classifiers Finished iteration 35 Its error rate was 0.3304 newweights sum to 0.6607 Saved 36 weak classifiers Finished iteration 36 Its error rate was 0.3304 newweights sum to 0.6608 Saved 37 weak classifiers Finished iteration 37 Its error rate was 0.328 newweights sum to 0.656 Saved 38 weak classifiers Finished iteration 38 Its error rate was 0.3263 newweights sum to 0.6526 Saved 39 weak classifiers Finished iteration 39 Its error rate was 0.3322 newweights sum to 0.6644 Saved 40 weak classifiers
Generally speaking, as the hardest tokens get a larger and larger fraction of the overall weight, the error rates get larger. The error rates should always remain less than 0.5, however.
epsilon = [ beta/(1+beta) for beta in betalist ]
fig, ax = plt.subplots(1,1,figsize=(14,4))
ax.plot(epsilon)
ax.set_title('weighted error vs. iteration number',fontsize=18)
Text(0.5, 1.0, 'weighted error vs. iteration number')
In order to test an adaboost classifier, you need to take a weighted average of the decisions of all of the weak classifiers. The decision of the strong classifier is
$$h(x)=\left\{\begin{array}{ll} 1 & \sum_t\alpha_th_t(x)\ge\frac{1}{2}\sum_t\alpha_t\\ 0 & \mbox{otherwise} \end{array}\right.$$where $\alpha_t=\log\frac{1}{\beta_t}$.
importlib.reload(submitted)
help(submitted.test_adaboost)
Help on function test_adaboost in module submitted: test_adaboost(ii, rects, hlist, betalist) Test a trained adaboost classifier. @param: ii (ntest,m+1,n+1): a stack of npix integral images rects (2,ntest,4,4): for each class, for each image, 4 rects hlist (list of lists): [[xfrac,yfrac,wfrac,hfrac,xorder,yorder,sign,threshold] for each weak classifier.] betalist (list of reals): [weighted_error/(1-weighted_error) for each weak classifier] @return: alpha (len(betalist)): the alpha weights hweak (2,ntest,4,len(betalist)): predictions of weak classifiers hstrong (2,ntest,4): predictions of strong classifier hstrong = (np.dot(hweak,alpha)>=np.sum(alpha)/2).astype('int')
OK, so first let's load the test images, compute their integral images, and convert data['test'][:]['rectangles']
to a numpy array:
ntest = len(data['test'])
m,n = img.shape
testim = np.empty((ntest,m,n))
testii = np.empty((ntest,m+1,n+1))
for i in range(ntest):
filename = data['test'][i]['filename']
testim[i,:,:]=np.array(Image.open(filename).convert('L'))
testii = submitted.integral_images(testim)
testre = np.empty((2,ntest,4,4))
for i in range(ntest):
for y in range(2):
for r in range(4):
testre[y,i,r,:]=np.array(data['test'][i]['rectangles'][y][r])
Now we can try running test_adaboost
:
with open('trained_model.json') as f:
trained_model = json.load(f)
hlist = trained_model['hlist']
betalist = trained_model['betalist']
# Convert xorder, yorder, and sign back to integers!
for t in range(len(hlist)):
hlist[t][4] = int(hlist[t][4])
hlist[t][5] = int(hlist[t][5])
hlist[t][6] = int(hlist[t][6])
importlib.reload(submitted)
alpha,hwk,hstr=submitted.test_adaboost(testii,testre,hlist,betalist)
errorcount = np.sum(hstr[0,:,:]!=0)+np.sum(hstr[1,:,:]!=1)
errorrate = errorcount / np.prod(hstr.shape)
print('Error rate of strong classifier is',errorrate)
Error rate of strong classifier is 0.0703125
Thus the strong classifier is not perfect, but it is better than any one of the weak classifiers alone!
For extra credit, figure out how to run your trained face classifier as a face detector. In each input image, test every possible rectangle within that image, to see which one has the highest face detection score, where you can define the score to be a weighted sum of the binary decisions from weak classifiers:
$$s(x)= \sum_t\alpha_th_t(x)$$To avoid outrageous computational complexity, you will not search the entire image, but only the part specified in the function call. Notice that, even with only 10 possibilities in each range, you are still searching 10,000 possible rectangles.
import extra, importlib
importlib.reload(extra)
help(extra.detect_faces)
Help on function detect_faces in module extra: detect_faces(ii, xrange, yrange, wrange, hrange, hlist, betalist) Use a trained Adaboost face detector to find a face in an image. You should look only in the set of rectangles (x,y,w,h) for x in xrange for y in yrange for w in wrange for h in hrange. @param: ii (m+1,n+1): integral image for just one input image xrange (iterable): possible upper-left corners of the rectangle yrange (iterable): possible upper-left corners of the rectangle wrange (iterable): possible widths of the rectangle hrange (iterable): possible heights of the rectangle hlist (list of lists): [[xfrac,yfrac,wfrac,hfrac,xorder,yorder,sign,threshold] for each weak classifier.] betalist (list of reals): [weighted_error/(1-weighted_error) for each weak classifier] @return: rect (4): top-rated rectangle in the image
importlib.reload(extra)
xrange = range(160,180,2)
yrange = range(20,40,2)
wrange = range(130,150,2)
hrange = range(170,190,2)
score = extra.detect_faces(testii[0,:,:],xrange,yrange,wrange,hrange,hlist,betalist)
best_ind = np.unravel_index(np.argmax(score),score.shape)
bestx = xrange[best_ind[0]]
besty = yrange[best_ind[1]]
bestw = wrange[best_ind[2]]
besth = hrange[best_ind[3]]
bestscore = score[best_ind]
print('Best rectangle is',bestx,besty,bestw,besth)
print('with score',bestscore)
Best rectangle is 170 38 130 178 with score 21.53589260657187
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax.imshow(testim[0,:,:],cmap='gray')
for y,color in enumerate(['c--','m-']):
for r in range(4):
plotrect(testre[y,0,r,:],ax,color)
plotrect([bestx,besty,bestw,besth],ax,'y-')
ax.set_title('Cyan: true, Magenta: false, Yellow: detected')
Text(0.5, 1.0, 'Cyan: true, Magenta: false, Yellow: detected')