Neurodynamical model of dynamic bodily action recognition
|Nejad, Ghazal Ghamkhari|
|Giese, Martin A.|
For social species such as primates, the recognition of dynamic body movements is important for survival. The detailed neural circuitry underlying the visual processing of dynamic bodies is not well understood. In monkeys, it is known that different body patches in the monkey cortex contribute to this processing. We present a physiologically-inspired neural model of the visual recognition of body movements in comparison with electrophysiological data from macaque monkeys. The model combines an image-computable model (‘ShapeComp’, Morgenstern et al., 2021) that produces high-dimensional vectors describing the shape of objects (based on their shape boundaries as input), with a neurodynamical model (Giese & Poggio, 2003) that has successfully reproduced the neural dynamics at the single-cell level in higher areas of the visual and premotor cortex. The model recognizes videos of body silhouettes performing various actions. The initial layers of the visual pathway that detect mid-level features are modeled by the ShapeComp network. This convolutional neural network architecture is trained using a GAN approach and represents the invariance properties of human shape perception better than other standard networks. The shape feature vectors of this network are used to train radial basis function networks which provide input to recurrent neural networks (neural fields) that encode sequences of keyframes (extracted from videos). The highest level of the model consists of motion pattern neurons that temporally summate the activity within individual neural fields that represent different body actions. The model’s responses were compared with macaque single-unit responses from the rostral dorsal bank of the Superior Temporal Sulcus (AMUB body patch) recorded for the same stimuli. The model successfully reproduces characteristics of real neurons at the population level. It also makes predictions about the dynamics of responses, e.g. in the presence of time gaps in the stimuli.