Abstract: In this paper, we extend the SonoNet architecture to capture spatio-temporal information from ultra-sound (US) sequences. More specifically, we propose 3D-SonoNet32 – which lifts 2D convolutions to 3D – and to an efficient (2+1)D variant – to keep the computational cost under control while preserving the benefits of the spatio-temporal model. We investigate the potential of these architectures on a scan-plane detection problem and discuss how these methodologies can be beneficial for AI-driven online “scan assistants”, to enhance the quality and reproducibility of the evaluation and ultimately support the clinicians in the US examination. Our main contributions are (i) the design of novel Space-Time SonoNet architectures for analysing US video sequences, (ii) an in depth experimental analysis to show the benefit of using space-time models with respect to purely spatial ones, and to discuss the potential improvements gained by exploiting domain-specific properties like temporal coherence and prior knowledge of the ongoing scan. Overall, we show that the proposed models are specifically designed to be computationally lightweight, but also competitive in performance, making them suitable for real-time deployment on portable US devices.

SpaceTime-SonoNet: efficient classification of ultra-sound video sequences

Interlando M.;Zini L.;Noceti N.;Odone F.
2026-01-01

Abstract

Abstract: In this paper, we extend the SonoNet architecture to capture spatio-temporal information from ultra-sound (US) sequences. More specifically, we propose 3D-SonoNet32 – which lifts 2D convolutions to 3D – and to an efficient (2+1)D variant – to keep the computational cost under control while preserving the benefits of the spatio-temporal model. We investigate the potential of these architectures on a scan-plane detection problem and discuss how these methodologies can be beneficial for AI-driven online “scan assistants”, to enhance the quality and reproducibility of the evaluation and ultimately support the clinicians in the US examination. Our main contributions are (i) the design of novel Space-Time SonoNet architectures for analysing US video sequences, (ii) an in depth experimental analysis to show the benefit of using space-time models with respect to purely spatial ones, and to discuss the potential improvements gained by exploiting domain-specific properties like temporal coherence and prior knowledge of the ongoing scan. Overall, we show that the proposed models are specifically designed to be computationally lightweight, but also competitive in performance, making them suitable for real-time deployment on portable US devices.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1302797
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact