M. Sc. Stettler, Michael
Department of Cognitive Neurology
Hertie Institute for Clinical Brain Research
Centre for Integrative Neuroscience
University Clinic Tübingen
Otfried-Müller-Str. 25
72076 Tübingen, Germany

People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal’s features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02% with their FaceExpr model, which was trained on 43,000 images.
AIMS: Humans recognize social interactions and intentions from videos of moving abstract stimuli, including simple geometric figures (Heider {&} Simmel, 1944). The neural machinery supporting such social interaction perception is completely unclear. Here, we present a physiologically plausible neural model of social interaction recognition that identifies social interactions in videos of simple geometric figures and fully articulating animal avatars, moving in naturalistic environments. METHODS: We generated the trajectories for both geometric and animal avatars using an algorithm based on a dynamical model of human navigation (Hovaidi-Ardestani, et al., 2018, Warren, 2006). Our neural recognition model combines a Deep Neural Network, realizing a shape-recognition pathway (VGG16), with a top-level neural network that integrates RBFs, motion energy detectors, and dynamic neural fields. The model implements robust tracking of interacting agents based on interaction-specific visual features (relative position, speed, acceleration, and orientation). RESULTS: A simple neural classifier, trained to predict social interaction categories from the features extracted by our neural recognition model, makes predictions that resemble those observed in previous psychophysical experiments on social interaction recognition from abstract (Salatiello, et al. 2021) and naturalistic videos. CONCLUSION: The model demonstrates that recognition of social interactions can be achieved by simple physiologically plausible neural mechanisms and makes testable predictions about single-cell and population activity patterns in relevant brain areas. Acknowledgments: ERC 2019-SyG-RELEVANCE-856495, HFSP RGP0036/2016, BMBF FKZ 01GQ1704, SSTeP-KiZ BMG: ZMWI1-2520DAT700, and NVIDIA Corporation.
Dynamic facial expressions are crucial for communication in primates. Due to the difficulty to control shape and dynamics of facial expressions across species, it is unknown how species-specific facial expressions are perceptually encoded and interact with the representation of facial shape. While popular neural network models predict a joint encoding of facial shape and dynamics, the neuromuscular control of faces evolved more slowly than facial shape, suggesting a separate encoding. To investigate these alternative hypotheses, we developed photo-realistic human and monkey heads that were animated with motion capture data from monkeys and humans. Exact control of expression dynamics was accomplished by a Bayesian machine-learning technique. Consistent with our hypothesis, we found that human observers learned cross-species expressions very quickly, where face dynamics was represented largely independently of facial shape. This result supports the co-evolution of the visual processing and motor control of facial expressions, while it challenges appearance-based neural network theories of dynamic expression recognition.
All images and videos displayed on this webpage are protected by copyright law. These copyrights are owned by Computational Sensomotorics.
If you wish to use any of the content featured on this webpage for purposes other than personal viewing, please contact us for permission.