Have a Wetpaint account? Sign in
Introduction |
Version 2 - view current page
1.1 Introduction
Without doubt ,the crown jewel of nonverbal communication is the facial expressions channel.
Facial expression is a visible manifestation of the affective state, cognitive activity, intention, personality, and psychopathology of a person ; it plays a communicative role in interpersonal relations.
Facial expressions, and other gestures, convey non-verbal communication cues in face-to-face interactions.
These cues may also complement speech by helping the listener to elicit the intended meaning of spoken words. Searches reported that facial expressions have a considerable effect on a listening interlocutor; the facial expression of a speaker accounts for about 55 percent of the effect, 38 percent of the latter is conveyed by voice intonation and 7 percent by the spoken words.
As a consequence of the information that they carry, facial expressions can play an important role wherever humans interact with machines. Automatic recognition of facial expressions may act as a component of natural human-machine interfaces (some variants of which are called perceptual interfaces or conversational interfaces).
Such interfaces would enable the automated provision of services that require a good appreciation of the emotional state of the service user, as would be the case in transactions that involve negotiation, for example. Some robots can also benefit from the ability to recognize expressions .
Automated analysis of facial expressions for behavioural science or medicine is another possible application domain .
1.2 What are expressions?
So far, we have assumed the terms ‘expression’ and ‘facial expression’ to be self-explanatory.
However, it takes some considering to realize that a facial expression is
more than just a rearrangement of features in a face. Webster’s dictionary defines expressions as:
intentionally or unintentionally convey a meaning. Our face often tells the feelings we have for someone or something even while we might not be aware of it ourselves.
During a conversation, our face constantly provides clues of information which will help the conversation progress. Even very subtle facial cues can carry a message such as “I understand what you’re talking about”, “I disagree” or “I’m only joking, don’t take me seriously”.
1.3 History Glance
The automated system, which has been improved, could be a boon for behavioural studies. Scientists have already found ways, for example, to distinguish false facial expressions of emotion from genuine ones. In depressed individuals, they’ve also discovered differences between the facial signals of suicidal and non-suicidal patients.
Such research relies on a coding system developed in the 1970s by Paul Ekman of the University of California, San Francisco, a co-author of the Psychophysiology paper. Ekman’s Facial Action Coding System (FACS) breaks down facial expressions into 46 individual motions, or action units.
Sejnowski’s team designed the computer program to use the same coding system. Their challenge was to enable the program to recognize the minute facial movements upon which the coding system is based.
Other researchers had come up with different computerized approaches for analyzing facial motion, but all had limitations, says Sejnowski, who is director
of the Computational Neurobiology Laboratory at The Salk Institute for
Biological Studies in La Jolla, California, and a professor of biology at the University of California, San Diego (UCSD). A technique called feature-based
analysis, for example, measures variables such as the degree of skin wrinkling at various points on the face.
“The trouble,” Sejnowski explains, “is that some people don’t wrinkle at all and some wrinkle a lot. It depends on age and a lot of other factors, so it’s not always reliable.” His team—which included Ekman, Marian Stewart Bartlett of UCSD and Joseph Hager of Network Information Research Corp. in Salt Lake City—took the best parts of three existing facial motion- analysis systems and combined them.
“We discovered that although each of the methods was imperfect, when we combined them the hybrid method performed about as well as the human expert, which is at an accuracy of around 91 percent,” Sejnowski says.
The computer program did much better than human non-experts, who performed with only 73.7 percent accuracy after receiving less than an hour of practice in recognizing and coding action units. The coding process involves identifying and marking sequences of frames in which an individual facial expression begins, peaks and ends. A minute of video can contain several hundred action units to recognize and code. In the work reported in Psychophysiology, the researchers taught the computer program to recognize 6 of the 46 action units.
Since then, the program has mastered six more and, by incorporating new image-analysis methods developed in Sejnowski’s lab, the system’s performance has risen to 95 percent accuracy. The additional work was published in the October 1999 issue of IEEE Transactions on Pattern Analysis and Machine Intelligence.
Now the team is engaged in a friendly “cooperative competition” with researchers from Carnegie Mellon University and the University of Pittsburgh who have developed a similar system. The two systems will be tested on the same images to allow direct comparisons of performance on individual images as well as overall accuracy. The teams will then collaborate on a new system that incorporates the best features of each.
A computer that accurately reads facial expressions could result in a better lie detector, which is why the CIA is funding the joint project.
But Sejnowski sees other possible commercial applications as well. “This software could very well end up being part of everybody’s computer,” he says. “One of the goals of computer science is to have computers interact with
us in the same way we interact with other human beings. We’re beginning to see programs that can recognize speech.”
But humans use more than speech recognition when they communicate with each other, he explains. In face-to-face conversation, “you watch how a person reacts to know whether they’ve understood what you’ve said and how they feel about it.” Your desktop computer can’t do that, so it doesn’t know when it has correctly interpreted your words or when it has bungled the meaning. With this software and a video camera mounted on your monitor, Sejnowski thinks your computer might someday read you as well as your best friend does.
1.4 Why classify expressions?
Human-computer interaction remains an active field of research with constant new advances being made, but also with old problems remaining an issue. Currently, interaction with a PC or Mac usually runs through rather simple channels. Input is usually given as formalized instructions using a hand-controller input device (keyboard/ mouse/joystick) and the results are presented as images on a monitor or as sound containing only little information (such as warning beeps). This seems to be a very unnatural and inefficient way for a human being to communicate. In every day human-to-human communication, information is exchanged in a highly multi-modal
way. Speech conveys words, but for example intonation, hesitations and context can completely alter the meaning of those words. Furthermore, in human-to-human conversations, a large part of the information is transferred through gestures and facial expressions. Although natural communication with a computer seems like a far off goal, progress in this direction is gradually being made. Speech recognition and synthesis is gradually finding its way to a growing number of practical applications. An effective automatic expression recognition system could take this progress towards a more natural human-computer communication to the next level.
Automatic expression analysis can be of particular relevance for a number of
expression monitoring applications where it would be undesirable or even infeasible to manually annotate the available data (for example a video stream). Just to name a few applications, the reaction of potential customers to products or advertisements would be of great value for marketing purposes. Forensic investigation could benefit from a method to automatically detect signs of extreme emotions, fear or aggression as an early warning system. Also, an expression monitoring system could be used for example to track the condition of jet-plane pilots under extreme conditions.
Other applications that have been pointed out are for physical impaired people (using expressions as a control method), for rehabilitation of people with an impairment to the facial muscles or as a tool with psychiatric possibilities.
In addition to the direct applications of an expression recognizer, by studying
the classification of expressions, we might obtain relevant information which can be used for other purposes. In order to come to a classification, the information containing the expression will first of all have to be identified and extracted. Once this information concerning expressions has been obtained, it could be used to gain better understanding of how expressions are formed. It could also be used to morph faces to either show a certain emotion or to neutralize them. Adding expressions could be useful to face-synthesis applications (e.g. to create animations or personal avatars), while neutralizing faces might help other classifiers/identifiers to perform better by removing irrelevant expression information.
Finally, the proposed system for expression classification does not have to be
limited to the classification of expressions alone. Similar methods could be used to classify faces on properties such as identity, age, sex, ethnicity, nose type, etc.
Automatic detection of the latter set of properties could be of great use, for example for security systems, automatic creation of police records or customer composition tracking. In fact, if the framework that’s constructed is general enough, there is no reason to be limited to handling face images, but could be trained to work on different objects as well
Without doubt ,the crown jewel of nonverbal communication is the facial expressions channel.
Facial expression is a visible manifestation of the affective state, cognitive activity, intention, personality, and psychopathology of a person ; it plays a communicative role in interpersonal relations.
Facial expressions, and other gestures, convey non-verbal communication cues in face-to-face interactions.
These cues may also complement speech by helping the listener to elicit the intended meaning of spoken words. Searches reported that facial expressions have a considerable effect on a listening interlocutor; the facial expression of a speaker accounts for about 55 percent of the effect, 38 percent of the latter is conveyed by voice intonation and 7 percent by the spoken words.
As a consequence of the information that they carry, facial expressions can play an important role wherever humans interact with machines. Automatic recognition of facial expressions may act as a component of natural human-machine interfaces (some variants of which are called perceptual interfaces or conversational interfaces).
Such interfaces would enable the automated provision of services that require a good appreciation of the emotional state of the service user, as would be the case in transactions that involve negotiation, for example. Some robots can also benefit from the ability to recognize expressions .
Automated analysis of facial expressions for behavioural science or medicine is another possible application domain .
1.2 What are expressions?
So far, we have assumed the terms ‘expression’ and ‘facial expression’ to be self-explanatory.
However, it takes some considering to realize that a facial expression is
more than just a rearrangement of features in a face. Webster’s dictionary defines expressions as:
“Lively or vivid representation of meaning, sentiment, or feeling..”
Although we might not be aware of it, in practically all situations, expressionsintentionally or unintentionally convey a meaning. Our face often tells the feelings we have for someone or something even while we might not be aware of it ourselves.
During a conversation, our face constantly provides clues of information which will help the conversation progress. Even very subtle facial cues can carry a message such as “I understand what you’re talking about”, “I disagree” or “I’m only joking, don’t take me seriously”.
1.3 History Glance
The automated system, which has been improved, could be a boon for behavioural studies. Scientists have already found ways, for example, to distinguish false facial expressions of emotion from genuine ones. In depressed individuals, they’ve also discovered differences between the facial signals of suicidal and non-suicidal patients.
Such research relies on a coding system developed in the 1970s by Paul Ekman of the University of California, San Francisco, a co-author of the Psychophysiology paper. Ekman’s Facial Action Coding System (FACS) breaks down facial expressions into 46 individual motions, or action units.
Sejnowski’s team designed the computer program to use the same coding system. Their challenge was to enable the program to recognize the minute facial movements upon which the coding system is based.
Other researchers had come up with different computerized approaches for analyzing facial motion, but all had limitations, says Sejnowski, who is director
of the Computational Neurobiology Laboratory at The Salk Institute for
Biological Studies in La Jolla, California, and a professor of biology at the University of California, San Diego (UCSD). A technique called feature-based
analysis, for example, measures variables such as the degree of skin wrinkling at various points on the face.
“The trouble,” Sejnowski explains, “is that some people don’t wrinkle at all and some wrinkle a lot. It depends on age and a lot of other factors, so it’s not always reliable.” His team—which included Ekman, Marian Stewart Bartlett of UCSD and Joseph Hager of Network Information Research Corp. in Salt Lake City—took the best parts of three existing facial motion- analysis systems and combined them.
“We discovered that although each of the methods was imperfect, when we combined them the hybrid method performed about as well as the human expert, which is at an accuracy of around 91 percent,” Sejnowski says.
The computer program did much better than human non-experts, who performed with only 73.7 percent accuracy after receiving less than an hour of practice in recognizing and coding action units. The coding process involves identifying and marking sequences of frames in which an individual facial expression begins, peaks and ends. A minute of video can contain several hundred action units to recognize and code. In the work reported in Psychophysiology, the researchers taught the computer program to recognize 6 of the 46 action units.
Since then, the program has mastered six more and, by incorporating new image-analysis methods developed in Sejnowski’s lab, the system’s performance has risen to 95 percent accuracy. The additional work was published in the October 1999 issue of IEEE Transactions on Pattern Analysis and Machine Intelligence.
Now the team is engaged in a friendly “cooperative competition” with researchers from Carnegie Mellon University and the University of Pittsburgh who have developed a similar system. The two systems will be tested on the same images to allow direct comparisons of performance on individual images as well as overall accuracy. The teams will then collaborate on a new system that incorporates the best features of each.
A computer that accurately reads facial expressions could result in a better lie detector, which is why the CIA is funding the joint project.
But Sejnowski sees other possible commercial applications as well. “This software could very well end up being part of everybody’s computer,” he says. “One of the goals of computer science is to have computers interact with
us in the same way we interact with other human beings. We’re beginning to see programs that can recognize speech.”
But humans use more than speech recognition when they communicate with each other, he explains. In face-to-face conversation, “you watch how a person reacts to know whether they’ve understood what you’ve said and how they feel about it.” Your desktop computer can’t do that, so it doesn’t know when it has correctly interpreted your words or when it has bungled the meaning. With this software and a video camera mounted on your monitor, Sejnowski thinks your computer might someday read you as well as your best friend does.
1.4 Why classify expressions?
Human-computer interaction remains an active field of research with constant new advances being made, but also with old problems remaining an issue. Currently, interaction with a PC or Mac usually runs through rather simple channels. Input is usually given as formalized instructions using a hand-controller input device (keyboard/ mouse/joystick) and the results are presented as images on a monitor or as sound containing only little information (such as warning beeps). This seems to be a very unnatural and inefficient way for a human being to communicate. In every day human-to-human communication, information is exchanged in a highly multi-modal
way. Speech conveys words, but for example intonation, hesitations and context can completely alter the meaning of those words. Furthermore, in human-to-human conversations, a large part of the information is transferred through gestures and facial expressions. Although natural communication with a computer seems like a far off goal, progress in this direction is gradually being made. Speech recognition and synthesis is gradually finding its way to a growing number of practical applications. An effective automatic expression recognition system could take this progress towards a more natural human-computer communication to the next level.
Automatic expression analysis can be of particular relevance for a number of
expression monitoring applications where it would be undesirable or even infeasible to manually annotate the available data (for example a video stream). Just to name a few applications, the reaction of potential customers to products or advertisements would be of great value for marketing purposes. Forensic investigation could benefit from a method to automatically detect signs of extreme emotions, fear or aggression as an early warning system. Also, an expression monitoring system could be used for example to track the condition of jet-plane pilots under extreme conditions.
Other applications that have been pointed out are for physical impaired people (using expressions as a control method), for rehabilitation of people with an impairment to the facial muscles or as a tool with psychiatric possibilities.
In addition to the direct applications of an expression recognizer, by studying
the classification of expressions, we might obtain relevant information which can be used for other purposes. In order to come to a classification, the information containing the expression will first of all have to be identified and extracted. Once this information concerning expressions has been obtained, it could be used to gain better understanding of how expressions are formed. It could also be used to morph faces to either show a certain emotion or to neutralize them. Adding expressions could be useful to face-synthesis applications (e.g. to create animations or personal avatars), while neutralizing faces might help other classifiers/identifiers to perform better by removing irrelevant expression information.
Finally, the proposed system for expression classification does not have to be
limited to the classification of expressions alone. Similar methods could be used to classify faces on properties such as identity, age, sex, ethnicity, nose type, etc.
Automatic detection of the latter set of properties could be of great use, for example for security systems, automatic creation of police records or customer composition tracking. In fact, if the framework that’s constructed is general enough, there is no reason to be limited to handling face images, but could be trained to work on different objects as well

