Perspectives and Praxes for Socially Situated NLP: An Interdisciplinary Queer and Context-Aware Linguistic Approach to the Development of Linguistic Resources

Marra, Andrea

This thesis investigates how an interdisciplinary, queer, and context-aware approach can contribute to the study of potentially harmful language in Natural Language Processing (NLP). Drawing on the intersection of Sociolinguistics, Queer Linguistics, Philosophy of Language, and NLP, this work argues that the annotation of linguistically and socially sensitive phenomena cannot be reduced to a purely technical operation, but must rather be understood as a socially situated practice. Within this framework, the thesis introduces and tests the FATA (First Ask Then Act) paradigm, conceived as a methodological reorientation that integrates social reflexivity and community participation into the early stages of the research pipeline. The research unfolds across three interconnected investigations. The first comprises a preliminary study on Italian nomina agentis and on the relationship between grammatical gender, reference, social meaning, and power dynamics. The second presents a case study on the discursive construction surrounding Francesco Schettino and Carola Rackete within a corpus of annotated Italian tweets, which transparently documents the iterative design of an annotation schema for gender stereotypes and their associated dimensions. The third constitutes a case study on the reclamation of LGBTQ+ slurs in Italian, developed within the FAVOLOSA (Fair Automatisation and Visibility Of the LGBT+ Community On Slur (Re)Appropriation) project, which integrates a preliminary focus group, a sociolinguistic survey, and a multi-layered qualitative-quantitative interpretive analysis. Overall, the thesis shows that stereotypes and linguistic reclamation challenge the categorical simplifications often required by computational annotation. It therefore advances the proposal of more reflexive annotation schemas, sensitive to disagreement, social positionality, emotional charge, pragmatic complexity, and interpretive effort, thereby contributing to NLP research while simultaneously enriching the broader discourse within the social sciences.

La presente tesi indaga come un approccio interdisciplinare, queer e attento al contesto possa contribuire allo studio del linguaggio potenzialmente dannoso nell'ambito del Natural Language Processing (NLP). Muovendosi all'intersezione tra Sociolinguistica, Linguistica Queer, Filosofia del Linguaggio e NLP, questo lavoro sostiene che l'annotazione di fenomeni linguisticamente e socialmente sensibili non possa essere ridotta a una mera operazione tecnica, ma debba invece essere intesa come una pratica socialmente situata. All'interno di questa cornice, la tesi introduce e mette alla prova il paradigma FATA (First Ask Then Act), concepito come un riorientamento metodologico che porta la riflessività sociale e la partecipazione delle comunità nelle fasi iniziali del processo di ricerca. Il lavoro si sviluppa attraverso tre studi tra loro connessi. Il primo è uno studio preliminare sui nomina agentis italiani e sul rapporto tra genere grammaticale, referenzialità, significato sociale e dinamiche di potere. Il secondo è un caso di studio sulla costruzione discorsiva che ruota attorno alle figure di Francesco Schettino e Carola Rackete in un corpus di tweet italiani annotati e che documenta in maniera trasparente la progettazione iterativa di uno schema di annotazione degli stereotipi di genere e delle dimensioni a essi correlate. Il terzo studio riguarda la riappropriazione di termini denigratori nel contesto LGBTQ+ italiano, sviluppato nell'ambito del progetto FAVOLOSA (Fair Automatisation and Visibility Of the LGBT+ Community On Slur (Re)Appropriation), che integra un focus group preliminare, un questionario sociolinguistico e un'analisi interpretativa, sia di tipo qualitativo che di tipo quantitativo, multilivello. Nel complesso, la tesi mostra come gli stereotipi e la riappropriazione linguistica mettano in discussione le semplificazioni categoriali spesso richieste dall'annotazione computazionale. Essa avanza pertanto la proposta di schemi più riflessivi, sensibili al disaccordo, alla posizionalità sociale, alla carica emotiva, alla complessità pragmatica e allo sforzo interpretativo, con l'obiettivo di contribuire alla ricerca in ambito NLP arricchendo al contempo la riflessione nelle scienze sociali.