Validation of HUMAN PHENOTYPE ONTOLOGY (HPO) terms and development of an Artificial Intelligence-based diagnostic tool for systemic autoinflammatory diseases using the EUROFEVER Registry: the ODINO project

Matucci Cerinic, Caterina

Background: Systemic autoinflammatory diseases (SAIDs) are a heterogeneous group of disorders caused by dysregulation of the innate immune system, frequently presenting with recurrent fever and systemic inflammation. Despite major advances in molecular genetics, diagnosis remains challenging due to overlapping clinical features, variable expressivity, and incomplete genotype–phenotype correlations. Accurate disease classification is therefore crucial for early diagnosis and therapeutic optimization. The Human Phenotype Ontology (HPO) provides a standardized terminology for describing phenotypic features of genetic diseases. Although the HPO autoinflammatory diseases section was revised in 2022, the diagnostic accuracy of these terms has not yet been validated in large real-world cohorts. Objectives: The primary aim of this PhD project was to evaluate the diagnostic performance of HPO-based phenotypic representations in a large cohort of pediatric patients with SAIDs using machine-learning methods. Secondary objectives were to assess the adequacy of the current HPO framework for SAIDs and to develop a novel HPO-based diagnostic tool. Methods: clinical and genetic data from 2716 patients were extracted from the Eurofever Registry (EF). Due to limited sample sizes for rarer conditions, analyses focused on Familial Mediterranean Fever (FMF), Cryopyrin-Associated Periodic Syndromes (CAPS), Mevalonate Kinase Deficiency (MKD), Tumor Necrosis Factor Receptor–Associated Periodic Syndrome (TRAPS), and PFAPA. A total of 223 EF variables were codified into HPO terms, and missing terms were annotated. Supervised machine-learning models (Elastic Net, k-Nearest Neighbors, Random Forest, and XGBoost) were trained using multiclass and one-vs-rest (OVR) strategies and evaluated on an independent test set. Results: Among EF variables, 185 showed full HPO correspondence, 26 partial correspondence, and 13 no correspondence. Clinically relevant variables frequently missing from HPO, including ethnicity and fever duration, were incorporated into the final models. The OVR approach combined with XGBoost showed the best overall performance, with balanced accuracy of 0.92 for CAPS, 0.90 for FMF, and 0.92 for PFAPA, while performance was moderate for MKD (balanced accuracy 0.80,) and TRAPS (balanced accuracy 0.78). Feature importance analysis highlighted “fever duration” and “ethnicity” as key discriminative variables. Error analysis revealed significant enrichment of TRAPS R92Q patients among false-negative classifications. Unsupervised analysis demonstrated substantial phenotypic overlap between FMF patients with one versus two pathogenic MEFV mutations. Based on the final model, a user-friendly web application was developed to provide disease probability estimates from HPO inputs. Conclusions: HPO-based phenotypic modeling enables accurate classification of the five most frequent recurrent fevers included in this study. Our findings highlight the need to update the HPO framework to incorporate clinically relevant missing variables and symptom frequencies. A web app based on the proposed algorithm was developed, offering a valuable tool for early diagnosis. Further updates will refine the model as additional data from underrepresented diseases become available.