@incollection{vogelsKeypoint_2024, author = "Rufin Vogels and R. Raman and G.G. Nejad and Albert Mukovskiy and Alexander Lappe and Martin A. Giese and Lucas M. Martini and A. Bogn{\'a}r", abstract = "Non-verbal social communication relies on the interpretation of visual cues from the body. fMRI studies in macaques have identified regions within the inferotemporal (IT) cortex that exhibit heightened activation to bodies compared to faces and objects. Among these regions, the ventral bank Superior Temporal Sulcus (STS) patches, i.e. the mid STS (MSB) and anterior STS body patch (ASB), show selectivity for static (and dynamic) bodies. However, the body features that drive the response of these neurons, in particular their representation of body posture, within these two levels of processing are unclear. To investigate this, we recorded multi- unit responses, using 16-channel V-probes, within and around MSB and ASB in two monkeys, employing a stimulus set comprising 720 stimuli featuring a monkey avatar in 45 body postures, rendered from 16 viewing angles. The static stimuli were presented during passive fixation. We employed principal component regression to model the response of the neurons based on the 10 principal components of 22 2D body keypoints extracted from the stimuli, which explained about 90% of the stimulus variance. Of the body-category selective neurons (at least twofold higher response to dynamic bodies compared to dynamic faces and objects), the 2D key-point-based model explained the selectivity for body posture and view with a median reliability-corrected coefficient of determination of 0.42 and 0.20 in the MSB and ASB regions, respectively. Inclusion of the depth dimension increased the model fit significantly for ASB but not MSB. When comparing with a convolutional neural network (CNN; ResNet50-robust; regression on 50 PCs) feature-based approach, the keypoint-based model exhibited slightly inferior performance, particularly in ASB, when focusing on higher-layer features but remained superior to the lower- layer features-based CNN model. Inverting the keypoint models allowed visualization of the body features that drove the posture selectivity of the neurons. We found that these body features ranged from local body features like the upper limbs or tail to combinations of them, but rarely the entire body. Some neurons, even in the mid STS region, tolerated changes in the view of the preferred body parts. The view tolerance was significantly greater in ASB compared to MSB. Our study shows that a body keypoint representation explains a sizable proportion of the selectivity to body posture and view of macaque visual cortical neurons, especially in the mid STS. Furthermore, the modeling suggests that 3D cues contribute to the body selectivity of anterior but not posterior IT neurons.", booktitle = "Society for Neuroscience", title = "{K}eypoint-based modeling of body posture selectivity of macaque inferotemporal neurons", year = "2024", }