How do you train neural networks to understand and simulate human emotions?

From Short and Sweet AI, I’m Dr. Peper and today I’m discussing how to train your AI.

We use 10,000 possible combinations of muscle movements in the face to create one facial expression. Add to this more than 400 possible voice inflections, along with thousands of hand and body gestures. All these combinations change continuously throughout a human conversation. Our brains process these complex, sometimes intense emotions, subconsciously, in microseconds, over and over again throughout the day.  

Emotion and Datasets

The way AI can help us is to have machines that can effectively communicate with us and understand what we want.  They need to recognize our emotional state, how we’re feeling, through our voice, facial expressions and nonverbal cues.   

In order to teach computers how to understand emotions, AI researchers use machine learning and neural networks. Machines are very good at analyzing large amounts of data. We’re talking a dataset that has almost 8 million facial expressions. When a machine trains on that many variations, it learns to detect patterns in facial movements and even the nuances between a smirk and a smile. The machines can listen to voice tone and recognize sounds that indicate stress or anger. How does it do this?

Emotion Metrics

Using computer vision, the algorithms identify key landmarks on the face such as the tip of the nose, the corners of the mouth or the corners of the eyebrows. Deep learning algorithms then analyze the pixels of the images to classify the expressions. Combinations of these facial expressions are then mapped to emotions. Another program for analyzing speech evaluates not what is said, but how it is said, calculating changes in tone, loudness, tempo and voice quality to understand what’s happening and the emotion and gender of the speaker. These are called emotion metrics. And when tested against human emotions, the key emotion metrics have accuracies above 90%.

Many companies are working on emotion AI. Amazon has a network for speech based emotion detection. Another company, Affectiva, has a neural network called SoundNet, that can classify anger from audio data in 1.2 seconds, regardless of the speaker’s language. That’s as fast as a human can detect anger from a voice. Another company, Cogito, has a system which analyzes voices, of military veterans with PTSD, to determine if they need help.

FATE Flaws

But there are worries about this technology. Many people in the field raise concerns that these types of systems have FATE flaws. FATE flaws in AI stand for fairness, accountability, transparency and ethical flaws. For example, a study with one facial recognition algorithm, showed faces of black people are rated as angrier, than faces of white people, even when the faces of black people were smiling.

Lisa Barret, a professor of psychology, spent 2 years along with 4 other scientists scrutinizing the evidence, for the accuracy of emotion AI. They concluded that companies using AI cannot reliably fingerprint, emotions through expressions. However, she does think in the future, emotions can be measured more accurately, when more sophisticated metrics are available.

As she explained: “it’s intuitive that emotions are very complex. Sometimes people cry in anger, sometimes they shout, some people laugh when angry and sometimes, they just sit silently and plan the demise of their enemy”.

From Short and Sweet AI, I’m Dr. Peper.

As always you can find further reading, videos and podcasts in the show notes.

Leave a Reply