Semiautomatic Visual-Attention Modeling

Contact person: Mikhail Erofeev (

This research aims to sufficiently increase the quality of visualattention modeling to enable practical applications. We found that automatic models are significantly worse at predicting attention than even single-observer eye tracking. We propose a semiautomatic approach that requires eye tracking of only one observer and is based on time consistency of the observer’s attention.

Our comparisons showed the high objective quality of our proposed approach relative to automatic methods and to the results of single-observer eye tracking with no postprocessing. We demonstrated the practical applicability of our proposed concept to the task of saliency-based video compression.


During our work we have created the database of human eye-movements captured while viewing various videos (static and dynamic scenes, shots from cinema-like films and scientific databases)

Key features:

  • Includes only FullHD and 4K UHDTV video sequences
  • Includes only stereoscopic video sequences
  • Eye-movements were captured with high quality eye-tracking device: SMI iViewXTM Hi-Speed 1250, with a 500 Hz frequency (20 fixations per frame)
  • Additional post-processing was applied to improve records' accuracy
  • 43 fragments of motion video from various feature movies, commercial clips and stereo video databases
  • About 13 minutes of video (19760 frames)
  • 50 observers of different ages (mostly between 18–27 years old)

Get the database

Semiautomatic Visual-Attention Model

Our short-term memory retains a representation of our environment for some time. In fact, an observer’s next eye movement may be determined by short-term memory of the scene as much as by the current perception of it. This behavior can be viewed as temporal consistency of attention, i.e. objects that are salient in a certain frame are assumed to be salient in neighboring frames. This leads us to the idea of bidirectional temporal saliency propagation

Propagation of initial gaze map forward and backward along motion vectors

Objective Comparison

We performed comparison of the proposed approach with 11 automatic visual-attention models. To ensure fairness of comparison the metric invariant to most of brightness transforms and mixing with center prior model was developed (see the paper and open source implementation for details)

Objective evaluation of our temporal propagation technique compared with other state-of-the-art saliency models

Application to Video Compression

We have modified x264 H.264 video encoder to enable saliency-aware compression. More precisely saliency maps were used to modify macroblocks’ quantization-parameter values to spend more bits on salient regions and vice versa.

We used saliency maps obtained with different visual attention models for saliency-aware x264-based video compression. By expending fewer bits on the non-salient area, we achieved a quality increase in the salient region up to 0.022 EWSSIM for the same bit rate.