Training AI Algorithms on Mostly “Happy Faces” Introduces Prejudice and Reduces Accuracy

Facial recognition systems have a lot of challenges, one of which includes discrimination against particular demographic groups and genders. However, a recent study from MIT-associated researchers, the Universitat Oberta de Catalunya and the Universidad Autonoma de Madrid has investigated another challenging aspect that hasn’t received much attention: prejudice towards certain facial expressions. The coauthors claimed that the effect of expressions on facial recognition systems is ‘at least’ as effective as putting on a scarf, wig or glasses, and that facial recognition systems are trained with highly prejudiced datasets in this aspect.

This study provides more evidence that facial recognition systems is vulnerable to harmful discrimination. In a paper by the University of Colorado last fall, some researchers illustrated that AI from Amazon, Clarifai, Microsoft etc. had over 95% accuracy for cisgender men and women but recognized trans men as women 38% of the time.

Independent benchmarks of some vendors’ systems such as the Gender Shades project and the National Institute of Standards and Technology (NIST) have illustrated that facial recognition technology shows racial and gender prejudice and even suggested that the current technology can be very inaccurate, misidentifying people over 96% of the time.

During their research, experiments were carried out using three various state-of-the-art facial recognition models trained on open source databases including VGGFace2 (a dataset spanning over 3 million images of more than 9,100 people) and MS1M-ArcFace (which has over 5.8 million images of 85,000 people). They were benchmarked against four specific corpora:

• The Compound Facial Expression of Emotion, which contains photographs of 230 people captured in a lab-controlled environment.

• The Extended Cohn-Kanade (CK+), one of the most widely used databases for training and evaluating face expression recognition systems, with 593 sequences of photos of 123 people.

• CelebA, a large-scale face attribute dataset comprising 200,000 images of 10,000 celebrities.

• MS-Celeb-1M, a publicly available face recognition benchmark and dataset released in 2016 by Microsoft containing nearly 10 million images of 1 million celebrities.

The researchers noted that academics and corporations had taken off facial photographs from sources such as the web, movies and social media to face the issue of model training data scarcity. As is the case with most machine learning models, facial recognition models need large quantities of data in order to attain a certain level of accuracy. However, some of these sources of data are unbalanced, as a result of some facial expressions being less common than others. For instance, more happy faces than sad faces are shared by people on Twitter, Facebook and Instagram.

In order to classify the images from their four benchmark corpora by expression, the researchers made use of software from Affectiva that identifies 6 basic emotions and a neutral face, bringing the total up to 7. They discovered that the proportion of ‘neutral’ images was above 60% across all datasets, attaining 83.7% in MS-Celeb-1M. Following that as the second most common facial expression was the ‘happy’ face. For all datasets, about 90% of the images showed either a ‘neutral’ or ‘happy’ person. The other 5 facial expressions, ‘surprised’ and ‘disgusted’ barely exceeded 6% while ‘sad’, ‘fear’ and ‘anger’ wasn’t too represented (usually below 1%).

The results also varied by gender. In VGGFace2, the number of ‘happy’ men was just a little below half of the number of ‘happy’ women.

‘This remarkable under-representation of some facial expressions in the datasets produces… drawbacks,’ said the coauthors in a description of their work. ‘On the one hand, models are trained using highly biased data that results in heterogeneous performances. On the other hand, technology is evaluated only for mainstream expressions hiding its real performance for images with some specific facial expressions… gender bias is important because it might cause different performances for both genders’.

Next, the researchers conducted an evaluation to fathom the degree to which the facial expression biases in sample sets like CelebA might have an effect on the predictions of facial recognition systems. In all the three previously mentioned algorithms, performance was better on the most common expressions in the training databases— the ‘neutral’ and ‘happy’ faces.

The findings of the study suggest that variances in facial expressions cannot deceive systems into recognizing a person as someone else. They also imply that facial expressions biases leads to differences between a system’s ‘genuine’ comparison scores (scores that measure the capacity of an algorithm to distinguish between images of the same face) above 40%.

The researchers only made use of Affectiva’s software in the classification of emotions, which might have brought about unintentional prejudice during their experiments. They didn’t test any of the commercially deployed systems such as Amazon’s Rekognition, Google Cloud’s Vision API, or Microsoft Azure’s Face API. Regardless, they advocate that facial expression bias, as well as other developing bias should be reduced in future facial recognition databases.

‘The lack of diversity in facial expressions in facial databases intended for development and evaluation of facial recognition systems represents, among other disadvantages, a security vulnerability of the resulting systems,’ wrote the coauthors. ‘Small changes in facial expressions can easily mislead the facial recognition systems developed around those biased databases. Facial expressions have an impact on the matching scores computed by a face recognition system. This effect can be exploited as a possible vulnerability, reducing the probabilities to be matched’.

By Marvellous Iwendi.

Source: VentureBeat