CSC 370

Viola Jones Face Detection

 

The goal of this lab is to give you some appreciation for the components that go into the Viola-Jones face detection algorithm. We won't build implement the full algorithm here, but we will work with some of the Haar-like features that it is based on, and build a crude filter based on the first level of the Viola-Jones cascade.

For fun, we'll work with a picture that contains lots of faces. Use the photograph below featuring Paul Robeson.

Paul Robeson singing the Star Spangled Banner

Integral Image

Recall that the speed of Viola-Jones is based in part on its use of the integral image data structure. Briefly, each location in an integral image stores the sum total of the pixel at that location and all pixels to the left and above.

The integral image can be computed quite quickly in Matlab through two nested calls to the cumsum function. Read the help information for cumsum and figure out how to do this. Note that we must convert the image type to double before doing math with its values.

>> img = imread('robeson-photo.jpg');
>> img = double(img);
>> iim = ??

Haar-like Features

Our primitive face detector will be based upon two primitive Haar-like features computed over the entire image. These features are based upon differences computed between the summed pixel values over adjacent rectangular areas. The first feature uses two 4x12 pixel rectangles stacked vertically. The second uses three 4x4 pixel rectangles arranged horizontally. We will compute each of these below.

Begin with the 4x12 pixel rectangle sums. Recall that the sum over any rectangular region may be computed with just four references to the integral image. The code below will compute the rectangle sums over all rectangles in the image, starting with the upper left. Then, rectangle sums offset vertically by four pixels are subtracted from each other to get the final filter value.

b1 = iim(5:end,13:end)+iim(1:end-4,1:end-12)-iim(5:end,1:end-12)-iim(1:end-4,13:end);
f1 = b1(1:end-4,:)-b1(5:end,:);

The resulting feature value map is somewhat smaller than the original image, due to the overlap required to produce its values. We can add some padding around the borders corresponding to the number of pixels lost, to restore the filtered image to the original size and position the filter values in the right place with respect to the original image. If we threshold the filter response and plot the areas where it is high, you can see that it identifies most of the eyebrow ridges in the photograph.

f1p = padarray(f1,[4,6]);
imshow(f1p)
figure
imshow(img)
hold on
[y,x] = find(f1p>5000);
plot(x,y,'r.');

Next you should work on the three side-by-side 4x4 pixel blocks. First compute the block matrix b2, then the combined block filter f2. The center block should be positive in weight and counts double; the two blocks flanking it are negative. Again you should pad the filter image so that it is the same size as the original image. If you threshold and plot the filter response as before, you will see that it identifies nose bridges among other features. However, close examination reveals that the points identified are somewhat lower on the face than the brow ridges identified by the first filter. So the last step is to shift the second filter response matrix upwards by four pixels. You can do this by removing four rows from the top and adding them to the bottom, or you can use circshift. Now identify points where both filters have a strong response (say greater than 4000) and you have a crude but reasonably effective face detector. To check if you're doing this part right, look at the result I got for this filter without any scaling. By trying different scales, you can get more of the other faces in the image.

For extra credit, you can explore effects of scale. The filters we implemented have a fixed scale, but faces in images come in all sizes. There are two ways to change the scale -- you can adjust the filter or you can adjust the image. Creating the filter was a lot of work, but changing the image is easy with imresize. See if you can locate additional faces in the image by scaling it slightly up or down.