Visual encoding of goal-directed movements: a physiologically plausible neural model

Year:
2009
Type of Publication:
In Collection
Authors:
Giese, Martin A.
Caggiano, Vittorio
Casile, Antonino
Fleischer, Falk
Month:
11
BibTex:
Note:
not reviewed
Abstract:

Visual encoding of goal-directed movements: a physiologically plausible neural model Visual responses of action-selective neurons, e.g. in premotor cortex and the superior temporal sulcus of the macaque monkey, are characterized by a remarkable combination of selectivity and invariance. On the one hand, the responses of such neurons show high selectivity for details about the grip and the spatial relationship between effector and object. At the same time, these responses show substantial invariance against the retinal stimulus position. While numerous models for the mirror neuron system have been proposed in robotics and neuroscience, almost none of them accounts for the visual tuning properties of action-selective neurons exploiting physiologically plausible neural mechanisms. In addition, many existing models assume that action encoding is based on a full reconstruction of the 3D geometry of effector and object. This contradicts recent electrophysiological results showing view-dependence of the majority of action-selective neurons, e.g. in premotor cortex. We present a neurophysiologically plausible model for the visual recognition of grasping movements from real videos. The model is based on simple well-established neural circuits. Recognition of effector and goal object is accomplished by a hierarchical neural architecture, where scale and position invariance are accomplished by nonlinear pooling along the hierarchy, consistent with many established models from object recognition. Effector recognition includes a simple predictive neural circuit that results in temporal sequence selectivity. Effector and goal position are encoded within the neural hierarchy in terms of population codes, which can be processed by a simple gain field-like mechanism in order to compute the relative position of effector and object in a retinal frame of reference. Based on this signal, and object and effector shape, the highest hierarchy level accomplishes a distinction between functional (hand matches object shape and position) and dysfunctional (no match between hand and object shape or position) grips, at the same time being invariant against strong changes of the stimulus position. The model was tested with several stimuli from the neurophysiological literature and reproduces, partially even quantitatively, results about action-selective neurons in the STS and premotor cortex. Specifically, the model reproduces visual tuning properties and the view-dependence of mirror neurons in premotor cortex and makes additional predictions, which can be easily tested in electrophysiological experiments.