1. Home
  2. Technology
  3. Technology – Multimodality

Better Reaction through Innovative Sensing

We are pushing the envelope for meticulous strategies of discerning actual emotions through the use of multimodal techniques. Convolutional approach had been used for feature extraction and classification to detect early feature-level and late decision-level on Vision, Audition and Tactition.

This affective modalities research have merged Face, Speech, Physiological signals and Gestures to manifest individual or large group emotional state with refine classification.

Multimodal Assessment of Affective States

We have done substantial initial work on combining signals from multiple modalities such as audio (voice tone) and speech / text with facial expressions.

We achieved good progress on the integration for analysis of gestures and body pose into the visual analysis. The work so far has shown the importance of multimodal signals of emotions in more naturalistic scenarios and for better assessment of context.

Contextual factors also play an important role – certain modalities work better for certain emotions or mental states. It is therefore important to analyze multiple modalities and to determine which data modalities are best suited for certain situations.

Our Approach: Emotion Continuum


Using advanced image processing, human auditory and computer vision algorithms, we analyze and read emotions with high precision in real time from face images, video and voice tone. The strength of our algorithms lies in distinguishing subtle expressions and providing fine-grained emotional insights at a low computational cost.

  • Mapping function derived using dataset of close to half a million unique proprietary real person created from a combination of several psychophysical validation studies
  • RMS error of 7-8% (on test set)
  • Map feature representation to a point on the valence / arousal plane for face and voice
  • Regression (statistical machine learning) instead of classification
  • Computationally efficient (runs in real-time on any device)