CS262.Lect07.FaceDetection2

Session 7: Face Detection (cont.)
John Magee
8 February 2017
Slides courtesy of Diane H. Theriault
Question of the Day:
• How can we find faces in images?
Face Detection
• Compute features in the image
• Apply a classifier
• Viola & Jones. “Rapid Object Detection using a
Boosted Cascade of Simple Features”
What do Faces “Look Like”?
• Chosen features are responses of the image to
box filters at specific locations in the image
Using Features for Classification
• Gateway between the signal processing world and
the machine learning world
• For any candidate image, we will compute the response of the
image to some different filters at some different locations
• These will be our input to our machine learning algorithm
Using Features for Classification
• In Machine Learning, a classifier is a thing that can make a
decision about whether a piece of data belongs to a particular
class (e.g. “is this image a face or not”)
• There are different types of classifiers, and one way to find
their parameters by looking at some training data (i.e.
“supervised learning”)
• A “weak learner” is a classifier for which you can find the best
parameters you can, but it still does a mediocre job
Using Features for Classification
• “Weak Learner” Example: Apply a threshold to a feature
(the response of the image to a filter)
• How to find a threshold?
Using Features for Classification
• How to find a threshold?
• Start with labeled training data
– 9,832 face images (positive examples)
– 10,000 non-face images (sub-windows cut from other
pictures containing no faces) (negative examples)
• Compute some measure of how good a particular
threshold his (e.g. “accuracy”), then find the
threshold that gives the best result
Using Features for Classification
• For a particular threshold on a particular feature, compute:
• For each feature,
choose a threshold that
maximizes the accuracy
Classifier Result
– Accuracy: % correctly classified
– Classification Error: 1 - Accuracy
positive
negative
positive
True positives (faces that are identified as faces)
True negatives (non-face patches that are identified as non-faces)
False positives (non-faces identified as faces)
Known
False negatives (faces identified as non-faces)
Classification
True
Positive
False
Positive
negative
–
–
–
–
False
True
Negative Negative
“Confusion Matrix”
Using Features for Classification
• How do you know which feature to use?
– Try them all and pick the one that gives the best result
– Then, choose the next one that does the next best job,
emphasizing the misclassified images
• Each threshold on a single feature gives mediocre results, but
if you combine them in a clever way, you can get good results
– (That’s the extremely short version of “boosting”)
Classification with Adaboost
An awesome Machine Learning Algorithm!
• Training:
– Given a pool of “weak learners” and some data,
– Create a “boosted classifier” by
– choosing a good combination of K weak learners and
associated weights
• In our case “Train a Weak Learner” =
= Choose a feature to use and which threshold to apply
Classification with Adaboost
• Training:
• Initialization: Assign data weights uniformly to each data point
• For 1:K
– Train all of the “weak learners”
– Compute the weighted classification error using weights assigned to each
data point
– Choose the weak learner with the lowest weighted error
– Compute a classifier weight associated with the weak learner, based on
the classification error
– Adjust the weights for the data points to emphasize misclassified points
• (Specifics of how to compute weights in paper)
Classification with Adaboost
• Classification:
– Use the “boosted classifier” (the weak learners and
associated weights we found during training) to label faces
• Evaluate each weak learner we chose on the new data point by
– computing the response of the image to the filter and
– applying the threshold to
– obtain a binary result
• Make a final decision by computing a weighted sum of the
classification results from the weak learners
Classification Cascade
• Tradeoff between more “complex” classifier that uses more
features (computational cost) and accuracy
• What is an acceptable error rate?
• What is an acceptable computational cost?
• Can we have our cake and eat it too?
Classification Cascade
• Solution: Use a “cascade” of increasingly complex classifiers
• Create less complex classifiers with fewer weak learners that
achieve high detection rates (maybe with extra false positives)
• Evaluate more complex, more picky, classifiers only after the
image passes the early classifiers
• Train later classifiers in the cascade using only images that pass
earlier classifiers
To Detect Faces
• Divide large images into overlapping sub-windows
• Apply classifier cascade to each sub-window
• Apply to sub-windows of different sizes by scaling the features
(using larger box filters)
Discussion Questions:
• What is the relationship between an image feature and the
response of an image to a box filter applied at a particular
location?
• If you were given a set of labeled images and a filter response for
some particular filter, how would you choose a threshold to use?
• How would you adjust your procedure for finding the best
possible threshold if you wanted to find the best threshold that
recognized at least 99% of faces, even if it let through some nonfaces (false positives)?
• Given an image, what are the steps for labeling it as face or nonface?
• What is a classifier cascade and why would you want to use one?