Revisiting Dictionaries of Key Poses for Action Representation

IRIS

Action understanding is a critical component in various computer vision applications, with traditional approaches predominantly relying on video data. More recent methods, however, utilize 2D and 3D skeleton data to improve both speed and accuracy. Despite these advancements, many models still fail to produce holistic pose representations that are transparent or semantically meaningful, either relying on end-to-end pipelines or focusing on individual keypoints. In this paper, we present a novel representation methodology by revisiting the concept of key body poses, inspired by the bag-of-words approach. Specifically, we create a dictionary of key poses and convert each action sequence into a sequence of key poses. We then explore two alternative strategies for action classification: one based on the classical bag-of-words, which focuses on the frequency of key poses, and another that considers the temporal ordering of key poses. We evaluate the effectiveness of these dictionary-based representations on the BABEL dataset, which includes 3D human keypoints and a large set of action labels. Our experimental results demonstrate that both strategies provide meaningful cues for action recognition, explicitly capturing the action complexity by balancing detail and generalization.

Revisiting Dictionaries of Key Poses for Action Representation

Matteo Moro;Federico Figari Tomenotti;Nicoletta Noceti;Francesca Odone

2026-01-01

Abstract

Action understanding is a critical component in various computer vision applications, with traditional approaches predominantly relying on video data. More recent methods, however, utilize 2D and 3D skeleton data to improve both speed and accuracy. Despite these advancements, many models still fail to produce holistic pose representations that are transparent or semantically meaningful, either relying on end-to-end pipelines or focusing on individual keypoints. In this paper, we present a novel representation methodology by revisiting the concept of key body poses, inspired by the bag-of-words approach. Specifically, we create a dictionary of key poses and convert each action sequence into a sequence of key poses. We then explore two alternative strategies for action classification: one based on the classical bag-of-words, which focuses on the frequency of key poses, and another that considers the temporal ordering of key poses. We evaluate the effectiveness of these dictionary-based representations on the BABEL dataset, which includes 3D human keypoints and a large set of action labels. Our experimental results demonstrate that both strategies provide meaningful cues for action recognition, explicitly capturing the action complexity by balancing detail and generalization.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	ISBN
	
				9783032101914
9783032101921
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ID-146-Moro-Matteo.pdf accesso chiuso Tipologia: Documento in Pre-print Dimensione 1.7 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.7 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1300337

Citazioni

ND

0

0

social impact