Face recognition systems have been widely deployed in mobile devices for user authentication and payment applications. However, these biometric systems remain vulnerable to face presentation attacks, posing significant security risks. In recent years, numerous countermeasures have been proposed, with analysis of differences between bona fide and attack presentations being a commonly adopted strategy. Nevertheless, the variations in image attributes and region movements have not been thoroughly explored. In attack images, the textures of the facial region and the background tend to be more similar, while local regions often exhibit more consistent directions of movement compared to bona fide presentations. Motivated by this observation, we propose a novel face presentation attack detection method that leverages prior knowledge of region relationships. Specifically, each input face image sequence is first divided into small patches, which are then processed by a pre-trained TimeSformer network utilizing divided time and space attention mechanisms to extract deep features. Two metrics—Cosine similarity and mean squared error (MSE)—are subsequently employed to measure the texture similarity and movement relationships of the regions of interest. During the inference phase, these measurements are fused to distinguish bona fide from attack presentations. Extensive ablation and comparison experiments, conducted on six face presentation attack detection (PAD) databases (i.e., Idiap Replay-Attack, CASIA-MFSD, OULU-NPU, MSU-MFSD, 3DMAD, and HKBU-MARs V1+), demonstrate that our method achieves superior detection performance, significantly improving precision over state-of-the-art approaches in most experimental settings.
Face Presentation Attack Detection by Exploiting Prior Knowledge of Region Relationships
Fabio Roli;
2026-01-01
Abstract
Face recognition systems have been widely deployed in mobile devices for user authentication and payment applications. However, these biometric systems remain vulnerable to face presentation attacks, posing significant security risks. In recent years, numerous countermeasures have been proposed, with analysis of differences between bona fide and attack presentations being a commonly adopted strategy. Nevertheless, the variations in image attributes and region movements have not been thoroughly explored. In attack images, the textures of the facial region and the background tend to be more similar, while local regions often exhibit more consistent directions of movement compared to bona fide presentations. Motivated by this observation, we propose a novel face presentation attack detection method that leverages prior knowledge of region relationships. Specifically, each input face image sequence is first divided into small patches, which are then processed by a pre-trained TimeSformer network utilizing divided time and space attention mechanisms to extract deep features. Two metrics—Cosine similarity and mean squared error (MSE)—are subsequently employed to measure the texture similarity and movement relationships of the regions of interest. During the inference phase, these measurements are fused to distinguish bona fide from attack presentations. Extensive ablation and comparison experiments, conducted on six face presentation attack detection (PAD) databases (i.e., Idiap Replay-Attack, CASIA-MFSD, OULU-NPU, MSU-MFSD, 3DMAD, and HKBU-MARs V1+), demonstrate that our method achieves superior detection performance, significantly improving precision over state-of-the-art approaches in most experimental settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
Face_Presentation_Attack_Detection_by_Exploiting_Prior_Knowledge_of_Region_Relationships-RED.pdf
accesso chiuso
Tipologia:
Documento in Pre-print
Dimensione
1.95 MB
Formato
Adobe PDF
|
1.95 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



