Tutorial on Learning Approaches to Perceptual Quality Prediction


The accurate computational estimation of visual quality as it is perceived by humans is crucial for any visual communication or computing system that has humans as the ultimate receivers. However, the seemingly easy problem of estimating perceived quality presents astonishingly difficult challenges.

Traditional approaches to visual quality estimation can be classified into bottom-up and top-down approaches. While the former are based on computational models of the human visual system (HVS), the latter treat the HVS as a black box and track distortion related deviations in image statistics. With the rise of machine learning, recently a third category of data-driven approaches emerged that do not necessarily rely on explicit models and allow for end-to-end optimization.

In this tutorial, we will give an overview over these recent advances in data-driven approaches to perceptual quality prediction. We will give a short introduction to the methods and techniques of machine learning with an emphasis on deep neural networks and their training. Here we will focus on methods that are necessary to understand the underlying workings of state-of-the-art machine learning based quality models. Then we will examine and compare different concepts of quality prediction by tracing the development of the field and discuss different applications alongside the corresponding requirements and challenges for a quality prediction model. In the following we will extend this conceptual taxonomy by recent machine learning-based approaches with a strong emphasis on end-to-end trained methods. The most important approaches and concepts will be presented in detail and comparatively reviewed. Here, we will present the underlying principles and assumptions, the algorithmic details and the quantitative results. Based on a discussion of the limitations of the state of the art, challenges and promising future research directions will be presented.

The tutorial is self-contained and does not require special prior knowledge. All concepts and methods will be introduced. The tutorial is designed to be independent of, but complementary to the tutorial on Learned Data Compression, by Johannes Ballé, Saurabh Singh, and David Minnen.


Sebastian Bosse is a senior researcher and scientific project manager with the Machine Learning group and the Image Video Coding group in the Video Coding & Analytics Department of Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute, Berlin, Germany. He received the Dipl.-Ing. degree in Electrical Engineering and Information Technology from RWTH Aachen University, Germany, and his Ph.D. from TU Berlin, Germany. During his studies, he was a scholarship holder at Polytechnic University of Catalonia, Barcelona, Spain. Sebastian was a visiting researcher at Siemens Corporate Research, Princeton, USA, and a guest researcher at the Stanford Vision and Neuro-Development Lab, Stanford University, USA. Sebastian is a board member of the Video Quality Expert Group (VQEG). His research interests include (perceptual) video compression, computational models of perception, signal processing, computer vision, and machine learning.

Sören Becker is a research engineer in the Machine Learning group in the Video Coding & Analytics Department at the Fraunhofer Heinrich Hertz Institute, Berlin, Germany. He obtained a Bachelor’s degree in Cognitive Science from University of Osnabrück and a Master’s degree in Computer Science from Technical University Berlin. During his studies he was awarded two scholarships from the German Academic Exchange Service for summer research internships at Nanyang Technological University, Singapore, and University of British Columbia, Vancouver, Canada. His research interests include human perception, probabilistic modelling and computer vision

Back to Tutorials List