Publications
Parallel Backpropagation for Shared-Feature Visualization
MacAction: Realistic 3D macaque body animation based on multi-camera markerless motion capture
Social interaction is crucial for survival in primates. For the study of social vision in monkeys, highly controllable macaque face avatars have recently been developed, while body avatars with realistic motion do not yet exist. Addressing this gap, we developed a pipeline for three-dimensional motion tracking based on synchronized multi-view video recordings, achieving sufficient accuracy for life-like full-body animation. By exploiting data-driven pose estimation models, we track the complete time course of individual actions using a minimal set of hand-labeled keyframes. Our approach tracks single actions more accurately than existing pose estimation pipelines for behavioral tracking of non-human primates, requiring less data and fewer cameras. This efficiency is also confirmed for a state-of-the-art human benchmark dataset. A behavioral experiment with real macaque monkeys demonstrates that animals perceive the generated animations as similar to genuine videos, and establishes an uncanny valley effect for bodies in monkeys.Competing Interest StatementThe authors have declared no competing interest.
Predictive Features in Deep Neural Network Models of Macaque Body Patch Selectivity
Previous work has shown that neurons from body patches in macaque superior temporal sulcus (STS) respond selectively to images of bodies. However, the visual features leading to this body selectivity remain unclear. METHODS: We conducted experiments using 720 stimuli presenting a monkey avatar in various poses and viewpoints. Spiking activity was recorded from mid-STS (MSB) and anterior-STS (ASB) body patches, previously identified using fMRI. To identify visual features driving the neural responses, we used a model with a deep network as frontend and a linear readout model that was fitted to predict the neuron activities. Computing the gradients of the outputs backwards along the neural network, we identified the image regions that were most influential for the model neuron output. Since previous work suggests that neurons from this area also respond to some extent to images of objects, we used a similar approach to visualize object parts eliciting responses from the model neurons. Based on an object dataset, we identified the shapes that activate each model unit maximally. Computing and combining the pixel-wise gradients of model activations from object and body processing, we were able to identify common visual features driving neural activity in the model. RESULTS: Linear models fit the data well, with mean noise-corrected correlations with neural data of 0.8 in ASB and 0.94 in MSB. Gradient analysis on the body stimuli did not reveal clear preferences of certain body parts and were difficult to interpret visually. However, the joint gradients between objects and bodies traced visually similar features in both images. CONCLUSION: Deep neural networks model STS data well, even though for all tested models, explained variance was substantially lower in the more anterior region. Further work will test if the features that the deep network relies on are also used by body patch neurons.
Macaques show an uncanny valley in body perception
Previous work has shown that neurons from body patches in macaque superior temporal sulcus (STS) respond selectively to images of bodies. However, the visual features leading to this body selectivity remain unclear. METHODS: We conducted experiments using 720 stimuli presenting a monkey avatar in various poses and viewpoints. Spiking activity was recorded from mid-STS (MSB) and anterior-STS (ASB) body patches, previously identified using fMRI. To identify visual features driving the neural responses, we used a model with a deep network as frontend and a linear readout model that was fitted to predict the neuron activities. Computing the gradients of the outputs backwards along the neural network, we identified the image regions that were most influential for the model neuron output. Since previous work suggests that neurons from this area also respond to some extent to images of objects, we used a similar approach to visualize object parts eliciting responses from the model neurons. Based on an object dataset, we identified the shapes that activate each model unit maximally. Computing and combining the pixel-wise gradients of model activations from object and body processing, we were able to identify common visual features driving neural activity in the model. RESULTS: Linear models fit the data well, with mean noise-corrected correlations with neural data of 0.8 in ASB and 0.94 in MSB. Gradient analysis on the body stimuli did not reveal clear preferences of certain body parts and were difficult to interpret visually. However, the joint gradients between objects and bodies traced visually similar features in both images. CONCLUSION: Deep neural networks model STS data well, even though for all tested models, explained variance was substantially lower in the more anterior region. Further work will test if the features that the deep network relies on are also used by body patch neurons.
Similarity in monkey fMRI activation patterns for human and monkey faces but not bodies
Simultaneous recordings from posterior and anterior body responsive regions in the macaque Superior Temporal Sulcus
Feature selectivity of body-patch neurons assessed with a large set of monkey avatars
The contribution of dynamics to macaque body and face patch responses
Previous functional imaging studies demonstrated body-selective patches in the primate visual temporal cortex, comparing activations to static bodies and static images of other categories. However, the use of static instead of dynamic displays of moving bodies may have underestimated the extent of the body patch network. Indeed, body dynamics provide information about action and emotion and may be processed in patches not activated by static images. Thus, to map with fMRI the full extent of the macaque body patch system in the visual temporal cortex, we employed dynamic displays of natural-acting monkey bodies, dynamic monkey faces, objects, and scrambled versions of these videos, all presented during fixation. We found nine body patches in the visual temporal cortex, starting posteriorly in the superior temporal sulcus (STS) and ending anteriorly in the temporal pole. Unlike for static images, body patches were present consistently in both the lower and upper banks of the STS. Overall, body patches showed a higher activation by dynamic displays than by matched static images, which, for identical stimulus displays, was less the case for the neighboring face patches. These data provide the groundwork for future single-unit recording studies to reveal the spatiotemporal features the neurons of these body patches encode. These fMRI findings suggest that dynamics have a stronger contribution to population responses in body than face patches.
Physiologically-inspired neural model for social interaction recognition from abstract and naturalistic videos
Physiologically-inspired neural model for anorthoscopic perception
Neural model for the representation of static and dynamic bodies in cortical body patches
Neurophysiologically-inspired computational model of the visual recognition of social behavior and intent
AIMS: Humans recognize social interactions and intentions from videos of moving abstract stimuli, including simple geometric figures (Heider {&} Simmel, 1944). The neural machinery supporting such social interaction perception is completely unclear. Here, we present a physiologically plausible neural model of social interaction recognition that identifies social interactions in videos of simple geometric figures and fully articulating animal avatars, moving in naturalistic environments. METHODS: We generated the trajectories for both geometric and animal avatars using an algorithm based on a dynamical model of human navigation (Hovaidi-Ardestani, et al., 2018, Warren, 2006). Our neural recognition model combines a Deep Neural Network, realizing a shape-recognition pathway (VGG16), with a top-level neural network that integrates RBFs, motion energy detectors, and dynamic neural fields. The model implements robust tracking of interacting agents based on interaction-specific visual features (relative position, speed, acceleration, and orientation). RESULTS: A simple neural classifier, trained to predict social interaction categories from the features extracted by our neural recognition model, makes predictions that resemble those observed in previous psychophysical experiments on social interaction recognition from abstract (Salatiello, et al. 2021) and naturalistic videos. CONCLUSION: The model demonstrates that recognition of social interactions can be achieved by simple physiologically plausible neural mechanisms and makes testable predictions about single-cell and population activity patterns in relevant brain areas. Acknowledgments: ERC 2019-SyG-RELEVANCE-856495, HFSP RGP0036/2016, BMBF FKZ 01GQ1704, SSTeP-KiZ BMG: ZMWI1-2520DAT700, and NVIDIA Corporation.