Machine Learning

This lab will allow you to explore some facets of machine learning, especially as they may apply to a computer vision system. There is a set of base activities, and various suggestions for extension.

To keep things from getting too complicated, we'll work with an image that ships with Matlab. The image includes text at two different orientations, and the instances we'll try to identify are the letters in the image.

img = imread('text.png');
imshow(img);

Computing Features

To begin, we'll need to separate the individual characters and compute features for them. You can use bwlabel to separate the image into connected components, and regionprops to compute some features of the resulting regions. In doing so, it will be important to avoid using any features (such as area or position, for example) that are not dependent on the class of character but on accidental details of the image. (As an extension, you might find it interesting to include some of these sorts of features, and see whether they degrade the results.) It is up to you whether to include features that depend on rotation -- if you want the sideways text to be recognized in the same class as the horizontal text, you might want to exclude those. The code below computes four features for each component and places them into a matrix, where each row represents one instance.

lbl = bwlabel(img);
rp = regionprops(lbl,'Eccentricity','EulerNumber','Extent','Solidity');
rpm = [cat(1,rp.Eccentricity) cat(1,rp.EulerNumber),cat(1,rp.Extent),cat(1,rp.Solidity)];

Now we're ready to do some learning! There are several Matlab functions that implement various learning algorithms. Let's focus on two, fitensemble for boosting, and fitcsvm for support vector machines. You can train classifiers for both of these. Before doing so, we'll also need class labels for each of the components in the image. Here's a boosting classifier trained for the letter r:

tag = 'trehfeertsertmoawra.idtegreshd..dtsaeraseivih...eatf.dbd.ddnereiyeniart.t..smesysrevir..';
ens_r = fitensemble(rpm,tag'=='r','AdaBoostM1',100,'Tree');
prt = predict(ens_r,rpm);
sum(prt~=(tag'=='r'))

So far, so good. But this just shows that we've learned the training data perfectly. There's no guarantee that the system we trained will be able to generalize to new data. Let's create some new data by resizing the image (thus creating slightly altered shapes due to rasterization), and see how we do.

img2 = imresize(img,5/pi);
lbl2 = bwlabel(img2);
rp2 = regionprops(lbl2,'Eccentricity','EulerNumber','Extent','Solidity');
rpm2 = [cat(1,rp2.Eccentricity) cat(1,rp2.EulerNumber),cat(1,rp2.Extent),cat(1,rp2.Solidity)];
prt2 = predict(ens_r,rpm2);
sum(prt2~=(tag'=='r'))

As you can see, we made a number of errors on this new test. The fault in this case can't really be attributed to the learning algorithm, but to the training data -- our initial examples didn't have enough variability in them. In fact, all the training instances of the letter r look exactly the same!

rpm(tag=='r',:)

Activities

To expand our training pool, let's use the same trick as we used to generate test data: manipulate the original image. Create five versions of the original image, each blown up by a different factor between 1 and 2 (you can use the rand function to get a random number). If you want, you can also rotate them slightly using imrotate but be careful because this may alter the order of components and thus the correct tags will be different.

For both support vector machines and boosting, using at least two different letters, try the following: train a model on four of the five images, and use the fifth for testing. If you do this for all possible combinations (e.g., 1 2 3 4 train, 5 test; then 1 2 3 5 train, 4 test; etc.) then you are doing cross validation, which is a standard experimental technique. Report the average classification accuracy over all five folds. (Accuracy is only one metric that can be used to evaluate the results. You can look in more detail at the types of errors being made by computing the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN).

Because the experiments above have a lot of repetitive parts, you should consider using loops and/or defining some custom m-files that will carry out an experiment using the train/test data and target classes you specify.

Extensions

There are a lot of interesting ways you could extend this activity. The example above uses just four features, but you can experiment with additional ones, or an entirely different set. You could also go looking for other images with text in them (or create your own) and see whether it is harder to generalize to these. What are the effects of different fonts?