Posted on

Face Detection, a widely popular subject with a huge range of applications. Modern-day Smartphones and Laptops come with in-built face detection software, which can authenticate the identity of the user. There are numerous apps that can capture, detect and process a face in real-time, can identify the age and the gender of the user, and also can apply some really cool filters. The list is not limited to these mobile apps, as Face Detection also has a wide range of applications in Surveillance, Security and Biometrics as well. But the origin of its Success stories dates back to 2001 when Viola and Jones proposed the first-ever Object Detection Framework for Real-Time Face Detection in Video Footage.


This article is about taking a gentle look at the Viola-Jones Face Detection Technique, popularly known as Haar Cascades, and exploring some of the interesting concepts proposed by them. This piece of work was done long before the Deep Learning Era had even started. But it’s an excellent work in comparison to the powerful models that can be built with modern-day Deep Learning Techniques. The algorithm is still found to be used almost everywhere. It has fully trained models available on GitHub.


  1. What is Haar Cascade?
  2. How Haar Cascade able to detect faces?
  3. Problems and limitations of Haar Cascade.

What is Haar Cascade?

It is an Object Detection Algorithm used to identify faces in an image or a real-time video. The algorithm uses edge or line detection features proposed by Viola and Jones in their research paper “Rapid Object Detection using a Boosted Cascade of Simple Features” published in 2001. The algorithm is given a lot of positive images consisting of faces, and a lot of negative images not consisting of any face to train on them. The model created from this training is available at the OpenCV GitHub repository here.

The repository has the models stored in XML files, and can be read with the OpenCV methods. These include models for face detection, eye detection, upper body and lower body detection, license plate detection etc. Below we see some of the concepts proposed by Viola and Jones in their research.

A sample of Haar features used in the Original Research Paper published by Viola and Jones.

The first contribution to the research was the introduction of the haar features shown above. These features on the image make it easy to find out the edges or the lines in the image or to pick areas where there is a sudden change in the intensities of the pixels.

How does Haar Cascade detect face?


In the figure above, we can see that we are sliding a fixed size window (known as Slicing Windows) across our image at multiple scales. At each of these phases, our window stops, computes some features, and then classifies the region as Yes, this region does contain a face, or No, this region does not contain a face.

This requires a bit of machine learning. We need a classifier that is trained in using positive and negative samples of a face:

  • Positive data points are examples of regions containing a face
  • Negative data points are examples of regions that do not contain a face

Given these positive and negative data points, we can “train” a classifier to recognize whether a given region of an image contains a face.

Luckily for us, OpenCV can perform face detection out-of-the-box using a pre-trained Haar cascade:

Source: PyImagesearch

This ensures that we do not need to provide our own positive and negative samples, train our own classifier, or worry about getting the parameters tuned exactly right. Instead, we simply load the pre-trained classifier and detect faces in images.

Problems and limitations of Haar cascades

However, it’s not all good news. The detector tends to be the most effective for frontal images of the face.

Haar cascades are notoriously prone to false-positives — the Viola-Jones algorithm can easily report a face in an image when no face is present.

Finally, as we’ll see in the rest of this lesson, it can be quite tedious to tune the OpenCV detection parameters. There will be times when we can detect all the faces in an image. There will be other times when (1) regions of an image are falsely classified as faces, and/or (2) faces are missed entirely.

If the Viola-Jones algorithm interests you, take a look at the official Wikipedia page and the original paper. The Wikipedia page does an excellent job of breaking the algorithm down into easy-to-digest pieces.


About Author

This article is written by Han Sheng, Junior Artificial Intelligence Engineer in CertifAI, Penang, Malaysia. He has a passion for Deep Learning, Computer Vision, and also Edge Devices. He made several AI-based Web/Mobile Applications to help clients solving real-world problems. Feel free to read about him via his portfolio or Github profile.

Leave a Reply

Your email address will not be published. Required fields are marked *