Neural model for the visual recognition of hand actions
Fleischer, Falk |
Casile, Antonino |
Giese, Martin A. |
Neural model for the visual recognition of actions The visual recognition of goal-directed movements is crucial for the learning of actions, and possibly for the understanding of the intentions and goals of others. The discovery of mirror neurons has stimulated a vast amount of research investigating possible links between action perception and action execution [1,2]. However, it remains largely unknown what is the real extent of this putative visuo-motor interaction during visual perception of actions and which relevant computational functions are instead accomplished by possibly purely visual processing. Here, we present a neurophysiologically inspired model for the recognition of hand movements demonstrating that a substantial degree of performance can be accomplished by the analysis of spatio-temporal visual features within a hierarchical neural system that reproduces fundamental properties of the visual pathway and premotor cortex. The model integrates several physiologically plausible computational mechanisms within a common architecture that is suitable for the recognition of grasping actions from real videos: (1) A hierarchical neural architecture that extracts form and motion features with position and scale invariance by subsequent increase of feature complexity and invariance along the hierarchy [3,4,5]. (2) Learning of optimized features on different hierarchy levels using a trace learning rule that eliminates features which are not contributing to correct classification results [5]. (3) Simple recurrent neural circuits for the realization of temporal sequence selectivity [6,7,8]. (4) As novel computational function the model implements a plausible mechanism that combines the spatial information about goal object and its affordance and the specific posture, position and orientation of the effector (hand). The model is evaluated on video sequences of both monkey and human grasping actions. The model demonstrates that simple well-established physiologically plausible mechanisms account for important aspects of visual action recognition. Specifically, the proposed model does not contain explicit 3D representations of objects and the action. Instead, it realizes predictions over time based on learned pattern sequences arising in the visual input. Our results complements those of existing models [9] and motivates a more detailed analysis of the complementary contributions of visual pattern analysis and motor representations on the visual recognition of imitable actions. References [1] Di Pellegrino, G. et al. (1992): Exp. Brain Res. 91, 176-180. [2] Rizzolatti, G. and Craighero, L. (2004): Annu. Rev. Neurosci. 27, 169-192. [3] Riesenhuber, M. and Poggio, T. (1999): Nat. Neurosci. 2, 1019-1025. [4] Giese, A.M. and Poggio, T. (2003): Nat. Rev. Neurosci. 4, 179-192. [5] Serre, T. et al. (2007): IEEE Pattern Anal. Mach. Int. 29, 411-426. [6] Zhang, K. (1996): J. Neurosci. 16, 2112-2126. [7] Hopfield, J. and Brody, D. (2000): Proc Natl Acad Sci USA 97, 13919-13924. [8] Xie, X. and Giese, M. (2002): Phys Rev E Stat Nonlin Soft Matter Phys 65, 051904. [9] Oztop, E. et al. (2006): Neural Netw. 19, 254-271. |