Neural model of action-selective neurons in STS and area F5

Year:
2009
Type of Publication:
In Collection
Authors:
Giese, Martin A.
Casile, Antonino
Fleischer, Falk
Month:
02
BibTex:
Note:
not reviewed
Abstract:

Neural model of action-selective neurons in STS and area F5 The visual recognition of goal-directed movements is crucial for the understanding of intentions and goals of others as well as for imitation learning. So far, it is largely unknown how visual information about effectors and goal objects of actions is integrated in the brain. Specifically, it is unclear whether a robust recognition of goal-directed actions can be accomplished by purely visual processing or if it requires a reconstruction of the three-dimensional structure of object and effector geometry. We present a neurophysiologically inspired model for the recognition of goal-directed grasping movements from real video sequences. The model integrates several physiologically plausible mechanisms in order to realize the integration of information about goal objects and the effector and its movement: (1) A hierarchical neural architecture for the recognition of hand and object shapes, which realizes position and scale-invariant recognition by subsequent increase of feature complexity and invariance along the hierarchy based on learned example views [1,2,3]. However, in contrast to standard models for visual object recognition this invariance is incomplete, so that the retinal positions of goal object and effector can be extracted by a population code. (2) Simple recurrent neural circuits for the realization of temporal sequence selectivity [4,5,6]. (3) A novel mechanism combines information about object shape and affordance and about effector (hand) posture and position in an object-centered frame of reference. This mechanism exploits gain fields in order to implement the relevant coordinate transformation [7,8]. The model shows that a robust integration of effector and object information can be accomplished by well-established physiologically plausible principles. Specifically, the proposed model does not contain explicit 3D representations of objects and the effector movement. Instead, it realizes predictions over time based on learned view-dependent representation of the visual input. Our results complement those of existing models of action recognition [8] and motivate a more detailed analysis of the complementary contributions of visual pattern analysis and motor representations on the visual recognition of imitable actions. References [1] Riesenhuber, M. and Poggio, T. (1999): Nat. Neurosci. 2, 1019-1025. [2] Giese, A.M. and Poggio, T. (2003): Nat. Rev. Neurosci. 4, 179-192. [3] Serre, T. et al. (2007): IEEE Pattern Anal. Mach. Int. 29, 411-426. [4] Zhang, K. (1996): J. Neurosci. 16, 2112-2126. [5] Hopfield, J. and Brody, D. (2000): Proc Natl Acad Sci USA 97, 13919-13924. [6] Xie, X. and Giese, M. (2002): Phys Rev E Stat Nonlin Soft Matter Phys 65, 051904. [7] Salinas, E. and Abbott, L. (1995): J. Neurosci. 75, 6461-6474. [8] Pouget, A. and Sejnowski, T. (1997): J. Cogn. Neurosci. 9, 222-237. [9] Oztop, E. et al. (2006): Neural Netw. 19, 254-271.