Determining the visual focus of attention of people in a scene is a fundamental cue to understand social interactions from videos. Gaze direction is ideal for determining eye contact, a basic cue of non-verbal communication, but it is not always easy to recognise. Head direction is a well-known proxy of gaze direction, more robust to the variability of the scene, thus offering a valuable alternative. In this work, we consider HHP-net, a method for estimating the head direction from single frames based on a heteroscedastic neural network to estimate people’s head pose from a minimal set of head key points. We formulate the problem as a multi-task regression, to predict the pose as a triplet of Euler angles from the output of a 2D pose estimator. HHP-net also provides a measure of the aleatoric heteroscedastic uncertainties associated with the angles, through an ad-hoc loss function we introduce. In a thorough experimental analysis, we show that our model is efficient and effective compared with the state of the art, with only 2 degrees of degradation in the worst case counterbalanced by a space occupation 12 times smaller. We also show the beneficial effects of uncertainty on interpretability. Finally, we discuss the robustness of our method to input variability, showing that it can be seen as a plug-in to different pose estimators. As a proof-of-concept, we address social interaction analysis, with an algorithm to detect dyadic interactions in images.
Head pose estimation with uncertainty and an application to dyadic interaction detection
Figari Tomenotti, Federico;Noceti, Nicoletta;Odone, Francesca
2024-01-01
Abstract
Determining the visual focus of attention of people in a scene is a fundamental cue to understand social interactions from videos. Gaze direction is ideal for determining eye contact, a basic cue of non-verbal communication, but it is not always easy to recognise. Head direction is a well-known proxy of gaze direction, more robust to the variability of the scene, thus offering a valuable alternative. In this work, we consider HHP-net, a method for estimating the head direction from single frames based on a heteroscedastic neural network to estimate people’s head pose from a minimal set of head key points. We formulate the problem as a multi-task regression, to predict the pose as a triplet of Euler angles from the output of a 2D pose estimator. HHP-net also provides a measure of the aleatoric heteroscedastic uncertainties associated with the angles, through an ad-hoc loss function we introduce. In a thorough experimental analysis, we show that our model is efficient and effective compared with the state of the art, with only 2 degrees of degradation in the worst case counterbalanced by a space occupation 12 times smaller. We also show the beneficial effects of uncertainty on interpretability. Finally, we discuss the robustness of our method to input variability, showing that it can be seen as a plug-in to different pose estimators. As a proof-of-concept, we address social interaction analysis, with an algorithm to detect dyadic interactions in images.| File | Dimensione | Formato | |
|---|---|---|---|
|
CVIU_Tomenotti2024.pdf
accesso aperto
Tipologia:
Documento in Post-print
Dimensione
2.84 MB
Formato
Adobe PDF
|
2.84 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



