Neuralphysiologically-inspired models of visual action perception and the perception of causality

Neuralphysiologically-inspired models of visual action perception and the perception of causality

Research Area:

Neural and Computational Principles of Action and Social Processing


Martin A. Giese; Mohammad Hovaidi Ardestani


Visual action processsing is an important brain function with high importance for motor control and learning, imitation, and social perception. Based on electrophysiological results we develop detailed neural models for this visual function that result in predictiions that motivate new experimental questions. Opposed to many other biologically-inpired models for action recogniton and understanding, the model works on real grey.level videos, demonstrating the computational efficiency of the underlying neurocomputational mechanisms.

Our models are based on learned hierachies of neural fetaure detectors, modeling the properties of different neuron classes in  cortex. The model for the gecognition of goal-directed actions consists of three modules. The first models the processing of effector and object shapes (fFigure 1, A).  The integration of information over time is accomplished by a dynamic neural networki (neural field) that encodes action stimuli by a travelling pulse of neural activity. This neural representation is predictive. The second module integrates the spatial informaiton of effector (hand) and object (B). The third module integrates the information fro m the previous modules and consists of action-selective model neurons which reproduce properties of neurons in premeotor and parietal cortex, and in the superior temporal cortex (STS). The feature detectors in the model are largely formed by learning from example movies. View-invariant recogniton is accomplished by pooling the output signals from a smalll number of view-specific modules (not shownm in the figure).

Model for goal-directed action recogniton

Figure 1: Neural model for the recogniton of goal-directed hand actions (Fleischer et al. 2013). The model consists of three modules: (A) visual recognition hierarchy. (B) Affordance module that integrates the extracted informaiton about effector and object, and (C) Module with goal-directed action-seletive neurons that iintegrate the informatin of the previous modules.


The model reproduces numerous experimental results from electrophysiology, and it also has predicted and motivated a number of new experiments. Figure 2 shows the simulated activity at the highest level fo the model in comparison with data from an elecztrophysiological study by Galleseet al. (1996), demistrating that the model is able to recognize different grip types from grey level videos. At the same time, the model reproduces the observation in mrror neurons that actions without visible goal object result only in weak activity of mirror neurons. Figure 3 shows a comparison between the (normalized) simulated activity at the secodn-highst hierarchy level of thr model and the (nomalized) activity of miror neurons in are F5 of monkey cortex (Caggiano et al. 2011). Interstingly, the majorioty of the tested mirror neurons showed view-dependence, even though these neurons were recorded in premotor cortex, very high up in the cortical processing hierarchy.

 Recognition of grip types

Figure 2:Responses of model neurons to different grip types presented as gray level videos. In spite of the fact that the movies differ only in subtle details (finger positions) the neurons discriminate reliable between the grip types. The inset on the top shows the behavior of a mirror neuron in area F5 that was tested during the observation of two different grips. Like the activity of the real neurons, the activity of the model neurns breaks down if actions are shown without goal objects.


 View-dependence of mirror neurons

Figure 3: Left: View-dependent responses of mirror neurons in area F5 in premotor cortex, when an action stimulus is presented from three diffeent view points. 74% of tghe tested mirror neurons showed view-dependennce 8Caggiano et al. 2013).  for the recogniton of goal-directed hand actions (Fleischer et al. 2013). Right: Simulated activity at the second-highest level of our neural model, which reproduces the view-dependence of the neural responses.


Hand actions specify causal interactions between the efffector and the manipulated objects. The recognition of actions is thus tightly connected to the perception of causality. Interestingly, the model trained with hand actions generalizes to very abstract action stimuli that consist only of moving discs (after reduction of the precision of the form-tuned neurons). Such stimuli have been used by Gestalt psychologists to study the perception of causality (Michotte, 1946).We created such abstract stimuli from filmed hand actions by tracking of the positions of the hand and the object. We found very similar psychophysical results for the observation of real hand actions and of the abstract stimuli in temrs of causality ratings, adding manipulations that destroy the perception of causality. In addition, the activity at the highest level of our model reproduces qualitatively very well the psychophysical data. This suggests that the same neural structures that are involved in the visual processing of hand actions also are involved in the perception of causality from simple stimuli.


Launching effect

Figure 4:Classical stimulus by Michotte (1946) for the study of the visual percpetion of causality ('launching effect'):

Causality perception

Figure 5: Left: Comparison between the causality ratings for real hand action stimuli and causality stimuli, consisting of moving discs that were matched with the positons of hand and object in the naturalistic stimuli.The horizontal axis shows the size of timing manipulations that are known to reduce the impression of causality. The decays with these parameters are very simiular for naturalistic stimuli and the abstract causality stimuli.  Right: Simulated activity on the highest level of our neural model, testing it with exactly the same visual stimuli.  High similarity with the psychophysical data (Fleischer et al. 2012).





Caggiano, V., Giese, M. A., Thier, P. & Casile, A. (2015). Encoding of point of view during action observation in the Local Field Potentials of macaque area F5. Eur J Neurosci., 41(4), 466-476. [More] 
Giese, M. A., Falk, F., Vittorio, C., Jörn, P. & Thier, P (2014). Neural theory for the visual perception of goal-directed actions and perceptual causality Journal of Vision, 14(10) 1471. [More] 
Caggiano, V., Fogassi, L., Rizzolatti, G., Casile, A., Giese, M. A. & Thier, P. (2012). Mirror neurons encode the subjective value of an observed action. PNAS, 109(29), 11848-11853. [More] 
Caggiano, V., Fogassi, L., Rizzolatti, G., Pomper, J. K., Thier, P., Giese, M. A. et al. (2011). View-Based Encoding of Actions in Mirror Neurons of Area F5 in Macaque Premotor Cortex. Current Biology, 21(2), 144-148. [More] 
Giese, M. A., Caggiano, V. & Thier, P (2010). View-based neural encoding of goal-directed actions: a physiologically-inspired neural theory Journal of Vision, 10(7), 1095. [More]