Ground truth and benchmarking: long-range scenario
A benchmark for behavioural pattern recognition should be comprised of a collection of videos pertinent to the application domain as defined for the SEARISE system. For the stadium application in the Esprit (former LTU) Arena, one would therefore like to use recordings from real soccer matches as benchmark data. However, this approach is problematic for several reasons:
- Most soccer matches in the Esprit arena are (fortunately) relatively peaceful affairs. Thus, security-relevant scenes are hard to obtain.
- Even if security-relevant scenes were obtainable, legal constraints prohibit their storage beyond a short time interval, making them unsuitable as benchmark data.
- As a consequence of the scarcity of security-relevant scenes, a dataset compiled from stadium recordings would almost certainly not contain all events which are considered security-relevant in the stadium context by security professionals.
To deal with these problems, we created a benchmark dataset, the Tuebingen hooligan simulator, by staging relevant events. We contacted officers of the Duesseldorf police (Polizeiinspektion Nord and Polizeipraesidium) in charge of stadium security at the Esprit arena to obtain the expert knowledge necessary to decide which events to include. Past experience had shown that the communication between police officers and scientists can be fraught with difficulties, stemming mostly from differences in experience with security-relevant events. We decided to overcome these difficulties by implementing a prototyping approach. We began with a telephone interview, during which we compiled a list of normal (i.e. not security relevant) and security-relevant events.
Subsequently, we staged these events in a lecture theater with a group of approx. 10 lay actors. We repeated each event multiple times in different parts of the lecture theater, the resulting videos were overlaid to create the impression of a larger crowd. Two frames from the videos are shown below.
We showed these videos to the police officers, asking them for feedback with regard to:
- completeness of both normal and security-relevant events,
- and the correctness of the labels (normal vs. security-relevant).
Virtually all videos were deemed sufficiently realistic by the police officers. However, they pointed out some changes and missing events, such as smoke bombs or the burning of flags, for which we have yet to devise a viable re-enactment strategy. Moreover, some events which we initially considered security-relevant are part of the 'normal' set and vice versa. This feedback, which would not have been obtainable without our staging of the events, highlights the importance of refining the benchmark set through prototyping. The police officers have agreed to assist us in this undertaking. The full list of security-relevant and normal events which comprise the Tuebingen hoooligan simulator is
| normal background
brawl: crowd, converging and embedded
moving up/down over seats
pushing others (one or many)
walking along filled seat rows
vandalism against chairs
lighting and passing bengal torches
For a more quantitative evaluation of our video set with respect to the saliency of the security-relevant events, we also conducted eye-tracking experiments with one of the officers. We acquired a Tobii X60 mobile eye-tracking system, largely with SEARISE funds.
To construct a realistic and sufficiently difficult detection task, we built visual scenes by embedding the security-relevant events in neighbourhoods of normal events. This was accomplished by randomly filling a 7x7 grid of event patches with normal events. Spatial contiguity between patches was promoted by populating the grid with events drawn from a Markov random field with nearest-neighbour interactions. In half of the scenes thus generated, we placed a security relevant event somewhere on the grid. Two example scenes are shown below: