People

Martini, L. M., Lappe, A. & Giese, M. A (2025). Pose and shape reconstruction of nonhuman primates from images for studying social perception. Journal of Vision September 2025 . Vision Science Society. More

Pose and shape reconstruction of nonhuman primates from images for studying social perception

Abstract:

The neural and computational mechanisms of the visual encoding of body pose and motion remain poorly understood. One important obstacle in their investigation is the generation of highly controlled stimuli with exactly specified form and motion parameters. Avatars are ideal for this purpose, but for nonhuman species the generation of appropriate motion and shape data is extremely costly, where video-based methods often are not accurate enough to generate convincing 3D animations with highly specified parameters. METHODS: Based on a photorealistic 3D model for macaque monkeys, which we have developed recently, we propose a method that adjusts this model automatically to other nonhuman primate shapes, requiring only a small number of photographs and hand-labeled keypoints for that species. The resulting 3D model allows to generate highly realistic animations with different primate species, combining the same motion with different body shapes. Our method is based on an algorithm that deforms a polygon mesh of a macaque model with 10,632 vertices with an underlying rig of 115 joints automatically, matching the silhouettes of the animals and a small number of specified key points in the example pictures. Optimization is based on a composite error function that integrates terms for matching quality of the silhouettes, keypoints, and bone length, and for minimizing local surface deformation. RRSULTS: We demonstrate the efficiency of the method for several monkey and ape species. In addition, we are presently investigating in a psychophysical experiment how the body shape of different primate species interacts with the categorization of body movements of humans and non-human primates in human perception. CONCLUSION: Using modern computer graphics methods, highly realistic and well-controlled body motion stimuli can be generated from small numbers of photographs, allowing to study how species-specific motion and body shape interact in visual body motion perception. Acknowledgements: ERC 2019-SyG-RELEVANCE-856495; SSTeP-KiZ BMG: ZMWI1-2520DAT700.

Authors: Martini, Lucas M.; Lappe, Alexander; Giese, Martin A.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: In Collection

JRESEARCH_BOOK_TITLE: Journal of Vision September 2025

Publisher: Vision Science Society

Month: September

[Bibtex]

Lappe, A. & Giese, M. A. (2025). Another BRIXEL in the Wall: Towards Cheaper Dense Features. arXiv. More

Another BRIXEL in the Wall: Towards Cheaper Dense Features

Abstract:

Vision foundation models achieve strong performance on both global and locally dense downstream tasks. Pretrained on large images, the recent DINOv3 model family is able to produce very fine-grained dense feature maps, enabling state-of-the-art performance. However, computing these feature maps requires the input image to be available at very high resolution, as well as large amounts of compute due to the squared complexity of the transformer architecture. To address these issues, we propose BRIXEL, a simple knowl- edge distillation approach that has the student learn to re- produce its own feature maps at higher resolution. Despite its simplicity, BRIXEL outperforms the baseline DINOv3 models by large margins on downstream tasks when the res- olution is kept fixed. Moreover, it is able to produce feature maps that are very similar to those of the teacher at a frac- tion of the computational cost

Authors: Lappe, Alexander; Giese, Martin A.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: Article

Journal: arXiv

Year: 2025

Full text: Online version

[Bibtex]

Lappe, A. & Giese, M. A. (2025). Register and CLS tokens yield a decoupling of local and global features in large ViTs. NeurIPS 2025, 39. More

Register and CLS tokens yield a decoupling of local and global features in large ViTs

Abstract:

Recent work has shown that the attention maps of the widely popular DINOv2 model exhibit artifacts, which hurt both model interpretability and performance on dense image tasks. These artifacts emerge due to the model repurposing patch tokens with redundant local information for the storage of global image information. To address this problem, additional register tokens have been incorporated in which the model can store such information instead. We carefully examine the influence of these register tokens on the relationship between global and local image features, showing that while register tokens yield cleaner attention maps, these maps do not accurately reflect the integration of local image information in large models. Instead, global information is dominated by information extracted from register tokens, leading to a disconnect between local and global features. Inspired by these findings, we show that the CLS token itself, which can be interpreted as a register, leads to a very similar phenomenon in models without explicit register tokens. Our work shows that care must be taken when interpreting attention maps of large ViTs. Further, by clearly attributing the faulty behaviour to register and CLS tokens, we show a path towards more interpretable vision models.

Authors: Lappe, Alexander; Giese, Martin A.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: Article

Full text: PDF Online version

[Bibtex]

Lappe, A., Bognár, A., Nejad, G. G., Raman, R., Mukovskiy, A., Martini, L. M. et al (2024). Predictive Features in Deep Neural Network Models of Macaque Body Patch Selectivity. Journal of Vision September 2024 . Vision Science Society. More

Predictive Features in Deep Neural Network Models of Macaque Body Patch Selectivity

Abstract:

Previous work has shown that neurons from body patches in macaque superior temporal sulcus (STS) respond selectively to images of bodies. However, the visual features leading to this body selectivity remain unclear. METHODS: We conducted experiments using 720 stimuli presenting a monkey avatar in various poses and viewpoints. Spiking activity was recorded from mid-STS (MSB) and anterior-STS (ASB) body patches, previously identified using fMRI. To identify visual features driving the neural responses, we used a model with a deep network as frontend and a linear readout model that was fitted to predict the neuron activities. Computing the gradients of the outputs backwards along the neural network, we identified the image regions that were most influential for the model neuron output. Since previous work suggests that neurons from this area also respond to some extent to images of objects, we used a similar approach to visualize object parts eliciting responses from the model neurons. Based on an object dataset, we identified the shapes that activate each model unit maximally. Computing and combining the pixel-wise gradients of model activations from object and body processing, we were able to identify common visual features driving neural activity in the model. RESULTS: Linear models fit the data well, with mean noise-corrected correlations with neural data of 0.8 in ASB and 0.94 in MSB. Gradient analysis on the body stimuli did not reveal clear preferences of certain body parts and were difficult to interpret visually. However, the joint gradients between objects and bodies traced visually similar features in both images. CONCLUSION: Deep neural networks model STS data well, even though for all tested models, explained variance was substantially lower in the more anterior region. Further work will test if the features that the deep network relies on are also used by body patch neurons.

Authors: Lappe, Alexander; Bognár, Anna Nejad, Ghazaleh Ghamkhari Raman, Rajani Mukovskiy, Albert; Martini, Lucas M.; Vogels, Rufin Giese, Martin A.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: In Collection

[Bibtex]

Smekal, V., Solanas, T. S., Lappe, A., Giese, M. A. & de Gelder, B (2024). Data-driven Features of Human Body Movements and their Neural Correlate . ESCAN2024. More

Data-driven Features of Human Body Movements and their Neural Correlate

Authors: Smekal, Vojtěch Solanas, Tamás Szűcs Marta Poyo Lappe, Alexander; Giese, Martin A.; de Gelder, Beatrice

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: In Collection

[Bibtex]

Lappe, A., Bognár, A., Nejad, G. G., Mukovskiy, A., Martini, L. M., Giese, M. A. et al. (2024). Parallel Backpropagation for Shared-Feature Visualization. Advances in Neural Information Processing Systems(37), 22993-23012. More

Parallel Backpropagation for Shared-Feature Visualization

Authors: Lappe, Alexander; Bognár, Anna Nejad, Ghazaleh Ghamkhari Mukovskiy, Albert; Martini, Lucas M.; Giese, Martin A.; Vogels, Rufin

Research Areas: Biomedical and Biologically Motivated Technical Applications

Type of Publication: Article

Full text: PDF Online version

[Bibtex]

Abassi, E., Bognár, A., de Gelder, B., Giese, M. A., Isik, L., Lappe, A. et al. (2024). Neural Encoding of Bodies for Primate Social Perception. Journal of Neuroscience, 44(40). More

Neural Encoding of Bodies for Primate Social Perception

Abstract:

Primates, as social beings, have evolved complex brain mechanisms to navigate intricate social environments. This review explores the neural bases of body perception in both human and nonhuman primates, emphasizing the processing of social signals conveyed by body postures, movements, and interactions. Early studies identified selective neural responses to body stimuli in macaques, particularly within and ventral to the superior temporal sulcus (STS). These regions, known as body patches, represent visual features that are present in bodies but do not appear to be semantic body detectors. They provide information about posture and viewpoint of the body. Recent research using dynamic stimuli has expanded the understanding of the body-selective network, highlighting its complexity and the interplay between static and dynamic processing. In humans, body-selective areas such as the extrastriate body area (EBA) and fusiform body area (FBA) have been implicated in the perception of bodies and their interactions. Moreover, studies on social interactions reveal that regions in the human STS are also tuned to the perception of dyadic interactions, suggesting a specialized social lateral pathway. Computational work developed models of body recognition and social interaction, providing insights into the underlying neural mechanisms. Despite advances, significant gaps remain in understanding the neural mechanisms of body perception and social interaction. Overall, this review underscores the importance of integrating findings across species to comprehensively understand the neural foundations of body perception and the interaction between computational modeling and neural recording.

Authors: Abassi, Etienne Bognár, Anna de Gelder, Bea Giese, Martin A.; Isik, Leyla Lappe, Alexander; Mukovskiy, Albert; Solanas, Marta Poyo Taubert, Jessica Vogels, Rufin

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: Article

Full text: Online version

[Bibtex]

Lappe, A., Bognár, A., Nejad, G. G., Mukovskiy, A., Giese, M. A. & Vogels, R (2024). Encoding of bodies and objects in body-selective neurons. Society for Neuroscience . More

Encoding of bodies and objects in body-selective neurons

Abstract:

The primate visual system has evolved subareas in which neurons appear to respond more strongly to images of a specific semantic category, like faces or bodies. The computational processes underlying these regions remain unclear, and there is debate on whether this effect is in fact driven by semantics or rather by visual features that occur more often among images from the specific category. Recent works tackling the question of whether the same visual features drive responses of face-selective cells to face images and non-face images have yielded mixed results. Here, we report findings on shared encoding of body and object images in body-selective neurons in macaque superior temporal sulcus. We targeted two fMRI-defined regions, anterior and posterior body patches in two awake macaques using V probes, recording multi-unit activity in and around these patches. In a first phase, we recorded responses to a set of 475 images of a macaque avatar in various poses. We then trained a deep-neural-network based model to predict responses to these images, and subsequently evaluated the model on two sets of object and body stimuli consisting of 6857 and 2068 images, respectively. These images comprised a variety of object types and animal species. After the inference process, we selected the highest and lowest predicted activator for each recording channel from both object and body images. In a second phase, we recorded responses of the same multi-units to these stimuli. For analysis, we only kept those multi-unit sites with high test/retest reliability. Also, we only considered multi-unit sites for which the selected bodies elicited a significantly higher response than the selected objects. We then tested whether the high-predicted objects/bodies indeed lead to higher responses at the corresponding electrode than the low-predicted ones. Across neurons, we found a significant preference of the high-predicted stimulus for both objects and bodies. The highly-activating objects consisted of a variety of everyday objects and did not necessarily globally resemble a body. Furthermore, the correlations between predicted and recorded responses to the objects were consistently positive for both monkeys and recording areas, meaning that the model was able to predict responses to objects after having only been trained on images of a macaque avatar. Our results show that the feature preferences of body-selective neurons are at least partially shared between bodies and objects. On a larger scope, we provide further evidence that category selectivity arises due to highly shared visual features among category instances, rather than semantics.

Authors: Lappe, Alexander; Bognár, A. Nejad, G. G. Mukovskiy, Albert; Giese, Martin A.; Vogels, Rufin

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Vogels, R., Raman, R., Nejad, G. G., Mukovskiy, A., Lappe, A., Giese, M. A. et al (2024). Keypoint-based modeling of body posture selectivity of macaque inferotemporal neurons. Society for Neuroscience . More

Keypoint-based modeling of body posture selectivity of macaque inferotemporal neurons

Abstract:

Non-verbal social communication relies on the interpretation of visual cues from the body. fMRI studies in macaques have identified regions within the inferotemporal (IT) cortex that exhibit heightened activation to bodies compared to faces and objects. Among these regions, the ventral bank Superior Temporal Sulcus (STS) patches, i.e. the mid STS (MSB) and anterior STS body patch (ASB), show selectivity for static (and dynamic) bodies. However, the body features that drive the response of these neurons, in particular their representation of body posture, within these two levels of processing are unclear. To investigate this, we recorded multi- unit responses, using 16-channel V-probes, within and around MSB and ASB in two monkeys, employing a stimulus set comprising 720 stimuli featuring a monkey avatar in 45 body postures, rendered from 16 viewing angles. The static stimuli were presented during passive fixation. We employed principal component regression to model the response of the neurons based on the 10 principal components of 22 2D body keypoints extracted from the stimuli, which explained about 90% of the stimulus variance. Of the body-category selective neurons (at least twofold higher response to dynamic bodies compared to dynamic faces and objects), the 2D key-point-based model explained the selectivity for body posture and view with a median reliability-corrected coefficient of determination of 0.42 and 0.20 in the MSB and ASB regions, respectively. Inclusion of the depth dimension increased the model fit significantly for ASB but not MSB. When comparing with a convolutional neural network (CNN; ResNet50-robust; regression on 50 PCs) feature-based approach, the keypoint-based model exhibited slightly inferior performance, particularly in ASB, when focusing on higher-layer features but remained superior to the lower- layer features-based CNN model. Inverting the keypoint models allowed visualization of the body features that drove the posture selectivity of the neurons. We found that these body features ranged from local body features like the upper limbs or tail to combinations of them, but rarely the entire body. Some neurons, even in the mid STS region, tolerated changes in the view of the preferred body parts. The view tolerance was significantly greater in ASB compared to MSB. Our study shows that a body keypoint representation explains a sizable proportion of the selectivity to body posture and view of macaque visual cortical neurons, especially in the mid STS. Furthermore, the modeling suggests that 3D cues contribute to the body selectivity of anterior but not posterior IT neurons.

Authors: Vogels, Rufin Raman, R. Nejad, G. G. Mukovskiy, Albert; Lappe, Alexander; Giese, Martin A.; Martini, Lucas M.; Bognár, A.

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Lappe, A., Bognár, A., Nejad, G. G., Mukovskiy, A., Giese, M. A. & Vogels, R (2024). Encoding of bodies and objects in body-selective neurons. 2024 Neuroscience Meeting Planner . More

Encoding of bodies and objects in body-selective neurons

Authors: Lappe, Alexander; Bognár, Anna Nejad, Ghazal Ghamkhari Mukovskiy, Albert; Giese, Martin A.; Vogels, Rufin

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Martini, L. M., Lappe, A. & Giese, M. A (2024). Pose and shape reconstruction of nonhuman primates from images for studying social perception . Society for Neuroscience. More

Pose and shape reconstruction of nonhuman primates from images for studying social perception

Abstract:

The neural and computational mechanisms of the visual encoding of body pose and motion remain poorly understood. One important obstacle in their investigation is the generation of highly controlled stimuli with exactly specified form and motion parameters. Avatars are ideal for this purpose, but for nonhuman species the generation of appropriate motion and shape data is extremely costly, where video-based methods often are not accurate enough to generate convincing 3D animations with highly specified parameters. METHODS: Based on a photorealistic 3D model for macaque monkeys, which we have developed recently, we propose a method that adjusts this model automatically to other nonhuman primate shapes, requiring only a small number of photographs and hand-labeled keypoints for that species. The resulting 3D model allows to generate highly realistic animations with different primate species, combining the same motion with different body shapes. Our method is based on an algorithm that deforms a polygon mesh of a macaque model with 10,632 vertices with an underlying rig of 115 joints automatically, matching the silhouettes of the animals and a small number of specified key points in the example pictures. Optimization is based on a composite error function that integrates terms for matching quality of the silhouettes, keypoints, and bone length, and for minimizing local surface deformation. RRSULTS: We demonstrate the efficiency of the method for several monkey and ape species. In addition, we are presently investigating in a psychophysical experiment how the body shape of different primate species interacts with the categorization of body movements of humans and non-human primates in human perception. CONCLUSION: Using modern computer graphics methods, highly realistic and well-controlled body motion stimuli can be generated from small numbers of photographs, allowing to study how species-specific motion and body shape interact in visual body motion perception.

Authors: Martini, Lucas M.; Lappe, Alexander; Giese, Martin A.

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Stettler, M., Lappe, A., Taubert, N. & Giese, M. A. (2023). Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition.. More

Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition

Abstract:

People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal’s features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02% with their FaceExpr model, which was trained on 43,000 images.

Authors: Stettler, Michael Lappe, Alexander; Taubert, Nick; Giese, Martin A.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: Misc

Full text: Online version

[Bibtex]

Stettler, M., Lappe, A., Siebert, R., Taubert, N., Thier, P. & Giese, M. A (2023). Norm-referenced encoding of facial expressions facilitates transfer learning to novel head shapes. 2023 Neuroscience Meeting Planner . Washington, D.C.. More

Norm-referenced encoding of facial expressions facilitates transfer learning to novel head shapes

Authors: Stettler, Michael Lappe, Alexander; Siebert, Ramona Taubert, Nick; Thier, Peter Giese, Martin A.

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Peng, L., Lappe, A., Wen, S., Giese, M. A. & Thier, P (2023). Task-dependent switching of the tuning properties of F5 mirror neuron. Proceedings of the European Conference on Visual Perception (ECVP) . More

Task-dependent switching of the tuning properties of F5 mirror neuron

Authors: Peng, Lilei Lappe, Alexander; Wen, Shengjun Giese, Martin A.; Thier, Peter

Research Areas: Uncategorized

Type of Publication: In Collection

[Bibtex]

Heinrich, T., Lappe, A. & Hanke, F. D. (2022). Beyond the classic sensory systems: Characteristics of the sense of time of harbor seals (Phoca vitulina) assessed in a visual temporal discrimination and a bisection task. The Anatomical Record, 305(3), 704-714. More

Beyond the classic sensory systems: Characteristics of the sense of time of harbor seals (Phoca vitulina) assessed in a visual temporal discrimination and a bisection task

Abstract:

Abstract Beyond the classic sensory systems, the sense of time is most likely involved from foraging to navigation. As a prerequisite for assessing the role time is playing in different behavioral contexts, we further characterized the sense of time of a harbor seal in this study. Supra-second time intervals were presented to the seal in a temporal discrimination and a temporal bisection task. During temporal discrimination, the seal needed to discriminate between a standard time interval (STI) and a longer comparison interval. In the bisection task, the seal learnt to discriminate two STIs. Subsequently, it indicated its subjective perception of test time intervals as resembling either the short or long STI more. The seal, although unexperienced regarding timing experiments, learnt both tasks fast. Depending on task, time interval or duration ratio, it achieved a high temporal sensitivity with Weber fractions ranging from 0.11 to 0.26. In the bisection task, the prerequisites for the Scalar Expectancy Theory including a constant Weber fraction, the bisection point lying close to the geometric mean of the STIs, and no significant influence of the STI pair condition on the probability of a long response were met for STIs with a ratio of 1:2, but not with a ratio of 1:4. In conclusion, the harbor seal's sense of time allows precise and complex judgments of time intervals. Cross-species comparisons suggest that principles commonly found to govern timing performance can also be discerned in harbor seals.

Authors: Heinrich, Tamara Lappe, Alexander; Hanke, Frederike D.

Research Areas: Neural and Computational Principles of Action and Social Processing

Type of Publication: Article

Full text: PDF Online version

[Bibtex]

Personal Page

M. Sc. Lappe, Alexander

Projects

Publications

Pose and shape reconstruction of nonhuman primates from images for studying social perception

Another BRIXEL in the Wall: Towards Cheaper Dense Features

Register and CLS tokens yield a decoupling of local and global features in large ViTs

Predictive Features in Deep Neural Network Models of Macaque Body Patch Selectivity

Data-driven Features of Human Body Movements and their Neural Correlate

Parallel Backpropagation for Shared-Feature Visualization

Neural Encoding of Bodies for Primate Social Perception

Encoding of bodies and objects in body-selective neurons

Keypoint-based modeling of body posture selectivity of macaque inferotemporal neurons

Encoding of bodies and objects in body-selective neurons

Pose and shape reconstruction of nonhuman primates from images for studying social perception

Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition

Norm-referenced encoding of facial expressions facilitates transfer learning to novel head shapes

Task-dependent switching of the tuning properties of F5 mirror neuron

Beyond the classic sensory systems: Characteristics of the sense of time of harbor seals (Phoca vitulina) assessed in a visual temporal discrimination and a bisection task

Information

Quick Links

Collaborators

Social Media