What’s the Best Face Detector?. Comparing Dlib, OpenCV, MTCNN, and… | by Amos Stailey-Young

Evaluating Dlib, OpenCV DNN, Yunet, Pytorch-MTCNN, and RetinaFace

For a facial recognition drawback I’m engaged on, I wanted to determine which facial detection mannequin to pick out. Face detection is the primary a part of the facial recognition pipeline, and it’s vital that the detector precisely identifies faces within the picture. Rubbish in, rubbish out, in spite of everything.

Nevertheless, the myriad choices accessible left me feeling overwhelmed, and the scattered writings on the topic weren’t detailed sufficient to assist me resolve on a mannequin. Evaluating the varied fashions took a whole lot of work, so I figured relaying my analysis could assist people in related conditions.

The first trade-off when choosing a facial detection mannequin is that between accuracy and efficiency. However there are different components to contemplate.

A lot of the articles on face detection fashions are written both by the creators of the mannequin — sometimes in journals — or by these implementing the mannequin in code. In each circumstances, the writers, naturally, have a bias towards the mannequin they’re writing about. In some excessive circumstances, they’re basically promotional ads for the mannequin in query.

There aren’t many articles that evaluate how the completely different fashions carry out in opposition to one another. Including additional confusion, every time somebody is writing a few mannequin similar to RetinaFace, they’re speaking a few explicit implementation of that mannequin. The “mannequin” itself is absolutely the neural community structure, and completely different implementations of the identical community structure can result in completely different outcomes. To make issues extra difficult, the efficiency of those fashions additionally differs based on post-processing parameters, similar to confidence thresholds, non-maximum suppression, and so forth.

Each author casts their mannequin because the “finest”, however I shortly realized that “finest” depends upon context. There isn’t any goal finest mannequin. There are two fundamental standards when deciding which face detection mannequin is most applicable for the given context: accuracy and pace.

No mannequin combines excessive accuracy with excessive pace; it’s a trade-off. We even have to have a look at metrics past uncooked accuracy, on which most benchmarks are based mostly (appropriate guesses / complete pattern measurement), however uncooked accuracy will not be the one metric to concentrate to. The ratio of false positives to true positives, and false negatives to true negatives, can be an vital consideration. In technical phrases, the trade-off is between precision (minimizing false positives) and recall (minimizing false negatives). This article discusses the issue in additional depth.

There are a couple of present face detection datasets used for benchmarking, similar to WIDER FACE, however I at all times wish to see how the fashions will carry out by myself knowledge. So I randomly grabbed 1064 frames from my pattern of TV exhibits to check the fashions ( ±3% margin of error). When manually annotating every picture, I attempted to pick out as many faces as attainable, together with faces that had been partially or nearly absolutely occluded to provide the fashions an actual problem. As a result of I’m ultimately going to carry out facial recognition on the detected faces, I needed to check the boundaries of every mannequin.

The images can be found to obtain with their annotations. I’ve additionally shared a Google Colab pocket book to work together with the information here.

It helps to group the varied fashions into two camps; those who run on the GPU and those who run on the CPU. Usually, when you’ve got a CUDA-compatible GPU, it’s best to use a GPU-based mannequin. I’ve an NVIDIA 1080 TI with 11GB of reminiscence, which permits me to make use of among the larger-scale fashions. However, the dimensions of my venture is large (I’m speaking hundreds of video recordsdata), so the lightning-fast CPU-based fashions intrigued me. There aren’t many CPU-based face detection fashions, so I made a decision to check solely the most well-liked one: YuNet. Due to its pace, YuNet types my baseline comparability. A GPU mannequin should be considerably extra correct than its CPU counterpart to justify its slower processing pace.

YuNet

YuNet was developed with efficiency in thoughts with a mannequin measurement that’s solely a fraction of the bigger fashions. For example, YuNet has solely 75,856 parameters in comparison with the 27,293,600 that RetinaFace boasts, which permits YuNet to run on “edge” computing units that aren’t highly effective sufficient for the bigger fashions.

Code to implement the YuNet mannequin will be discovered on this repository. The simplest technique to get YuNet up and working is thru OpenCV.

cv2.FaceDetectorYN_create('./face_detection_yunet_2023mar.onnx',
"", 
(300, 300),
score_threshold=0.5)

The pre-trained mannequin is out there on the OpenCV Zoo repository here. Simply make certain when cloning the repo to make use of Git LFS (I made that mistake at first). There’s a Google Colab file I wrote to display accessible here.

YuNet carried out so much higher than I anticipated for a CPU mannequin. It’s capable of detect giant faces and not using a drawback however does wrestle a bit with smaller ones.

In a position to detect giant faces even when at indirect angles. The bounding field is a bit off, probably as a result of the picture must be resized to 300×300 to feed into the mannequin.

YuNet manages to search out nearly the entire faces however contains some false positives as nicely.

The accuracy improves enormously when limiting to the biggest face within the picture.

If efficiency is a major concern, YuNet is a good possibility. It’s even quick sufficient for real-time purposes, in contrast to the GPU choices accessible (at the very least with out some critical {hardware}).

YuNet makes use of a set enter measurement of 300×300, so the time distinction outcomes from resizing the pictures to those dimensions.

Dlib

Dlib is a C++-implementation with a Python wrapper that maintains a stability between accuracy, efficiency, and comfort. Dlib will be put in instantly via Python or accessed via the Face Recognition Python library. Nevertheless, there’s a very robust trade-off between Dlib accuracy and efficiency based mostly on the upsampling parameter. When the variety of instances to upsample is about to 0, the mannequin is quicker however much less correct.

No Upsampling

Upsampling = 1

The accuracy of the Dlib mannequin will increase with additional upsampling, however something increased than upsampling=1 would trigger my script to crash as a result of it exceeded my GPU reminiscence (which is 11GB by the way in which).

Dlib’s accuracy was considerably disappointing relative to its (lack of) pace. Nevertheless, it was excellent at minimizing false positives, which is a precedence of mine. Face detection is the primary a part of my facial recognition pipeline, so minimizing the variety of false positives will assist scale back errors downstream. To scale back the variety of false positives even additional, we will use Dlib’s confidence output to filter lower-confident samples.

There’s a giant discrepancy in confidence between false and true positives, which we will use to filter out the previous. Quite than select an arbitrary threshold, we will have a look at the distribution of confidence scores to pick out a extra exact one.

95% of the boldness values fall above 0.78, so excluding every part under that worth reduces the variety of false positives by half.

Whereas filtering by confidence reduces the variety of false positives, it doesn’t enhance the general accuracy. I might think about using Dlib when minimizing the variety of false positives is a major concern. However in any other case, Dlib doesn’t supply a big sufficient enhance in accuracy over YuNet to justify the a lot increased processing instances; at the very least for my functions.

OpenCV DNN

The first draw of OpenCV’s face detection mannequin is its pace. Nevertheless, its accuracy left one thing to be desired. Whereas it’s extremely quick when in comparison with the opposite GPU fashions, even its Prime 1 accuracy was hardly higher than YuNet’s total accuracy. It’s unclear to me during which scenario I might ever select the OpenCV mannequin for face detection, particularly since it may be difficult to get working (you need to construct OpenCV from supply, which I’ve written about here).

Pytorch-MCNN

The MTCNN mannequin additionally carried out fairly poorly. Though it was barely extra correct than the OpenCV mannequin, it was fairly a bit slower. Since its accuracy was decrease than YuNet, there was no compelling cause to pick out MTCNN.

RetinaFace

RetinaFace has a repute for being essentially the most correct of open-source face detection fashions. The take a look at outcomes again up that repute.

Not solely was it essentially the most correct mannequin, however most of the “inaccuracies” weren’t, in reality, precise errors. RetinaFace actually examined the class of “false constructive” because it picked up faces I hadn’t seen, hadn’t bothered to annotate as a result of I believed them too troublesome, or hadn’t even thought of a “face.”

It picked up a partial face in a mirror on this Seinfeld body.

It managed to find faces in image frames within the background of this Fashionable Household.

And it’s so good at figuring out “faces,” that it finds non-human ones.

It was a pleasant shock studying that RetinaFace wasn’t all that gradual both. Whereas it wasn’t as quick as YuNet or OpenCV, it was similar to MTCNN. Whereas it’s slower at decrease resolutions than MTCNN, it scales comparatively nicely and may course of increased resolutions simply as shortly. And RetinaFace beat Dlib (at the very least when having to upscale). It’s a lot slower than YuNet however is considerably extra correct.

Most of the “false positives” RetinaFace recognized will be excluded by filtering out smaller faces. If we drop the bottom quartile of faces, the false constructive fee drops drastically.

The boundary for the bottom quartile is 0.0035

Whereas RetinaFace is extremely correct, the errors do have a selected bias. Though RetinaFace identifies small faces with ease, it struggles with bigger, partially occluded ones, which is obvious if we have a look at face measurement relative to accuracy.

This could possibly be problematic for my functions because the measurement of a face in a picture is strongly correlated to its significance. Subsequently, RetinaFace could miss an important circumstances, similar to the instance under.

RetinaFace did not detect a face on this picture, however YuNet did.

Primarily based on my exams (which I’d like to emphasise are usually not essentially the most rigorous on this planet; so take them with a grain of salt), I might solely think about using both YuNet or RetinaFace, relying on whether or not pace or accuracy was my major concern. It’s attainable I’d consider using Dlib if I completely needed to attenuate false positives, however for my venture, it’s right down to YuNet or RetinaFace.

The GitHub repo used for this venture is out there here.

Thank you for being a valued member of the Nirantara family! We appreciate your continued support and trust in our apps.

Nirantara Social - Stay connected with friends and loved ones. Download now: Nirantara Social
Nirantara News - Get the latest news and updates on the go. Install the Nirantara News app: Nirantara News
Nirantara Fashion - Discover the latest fashion trends and styles. Get the Nirantara Fashion app: Nirantara Fashion
Nirantara TechBuzz - Stay up-to-date with the latest technology trends and news. Install the Nirantara TechBuzz app: Nirantara Fashion
InfiniteTravelDeals24 - Find incredible travel deals and discounts. Install the InfiniteTravelDeals24 app: InfiniteTravelDeals24

If you haven't already, we encourage you to download and experience these fantastic apps. Stay connected, informed, stylish, and explore amazing travel offers with the Nirantara family!

Source link

Sign up for Newsletter

Info Verse

What’s the Best Face Detector?. Comparing Dlib, OpenCV, MTCNN, and… | by Amos Stailey-Young | Jun, 2024

Evaluating Dlib, OpenCV DNN, Yunet, Pytorch-MTCNN, and RetinaFace

YuNet

Dlib

No Upsampling

Upsampling = 1

OpenCV DNN

Pytorch-MCNN

RetinaFace

Nirantara for Travel Savvy

Leave a Reply Cancel reply

Sign up for Newsletter

Evaluating Dlib, OpenCV DNN, Yunet, Pytorch-MTCNN, and RetinaFace

YuNet

Dlib

No Upsampling

Upsampling = 1

OpenCV DNN

Pytorch-MCNN

RetinaFace

Nirantara for Travel Savvy

Leave a Reply Cancel reply

Login