Kuno Kurzhals1 Emine Cetinkaya1 Yongtao Hu2 Wenping Wang2 Daniel Weiskopf1

1 Universit├Ąt Stuttgart, Germany      2 The University of Hong Kong, Hong Kong

The 35th ACM Conference on Human Factors in Computing Systems (CHI 2017)


The incorporation of subtitles in multimedia content plays an important role in communicating spoken content. For example, subtitles in the respective language are often preferred to expensive audio translation of foreign movies. The traditional representation of subtitles displays text centered at the bottom of the screen. This layout can lead to large distances between text and relevant image content, causing eye strain and even that we miss visual content. As a recent alternative, the technique of speaker-following subtitles places subtitle text in speech bubbles close to the current speaker. We conducted a controlled eye-tracking laboratory study (n = 40) to compare the regular approach (center-bottom subtitles) with content-sensitive, speaker-following subtitles. We compared different dialog-heavy video clips with the two layouts. Our results show that speaker-following subtitles lead to higher fixation counts on relevant image regions and reduce saccade length, which is an important factor for eye strain.



  title={{Close to the Action: Eye-Tracking Evaluation of Speaker-Following Subtitles}},
  author={Kurzhals, Kuno and Cetinkaya, Emine and Hu, Yongtao and Wang, Wenping and Weiskopf, Daniel},
  booktitle={Proceedings of the 35th ACM Conference on Human Factors in Computing Systems},