Robust decision making is a central bottleneck for automated driving (AD). Even rare policy failures can translate into safety-critical outcomes, and therefore learning-based approaches must deliver not only high nominal performance but also stability and reliability under perturbations. Deep reinforcement learning (DRL) provides a principled framework to learn driving policies from interaction, yet state-of-the-art methods still face recurring gaps in practice. They are often brittle under distribution shift, hard to train reproducibly in safety-constrained tasks, and vulnerable to atypical or adversarial behaviors by other agents. This thesis advances the state of the art by introducing and validating a coherent training stack for robust DRL-based decision making that treats stability and safety as design objectives. The core claim is that robustness emerges from structured problem formulation, training processes that control difficulty over time, systematic evaluation under operational design domain (ODD) shifts, and systematic exposure to failure modes through adversarial interactions. Concretely, the proposed pipeline integrates maneuver-level action abstraction, curriculum learning (CL), scenario diversification and benchmarking for ODD coverage, and adversarial fine-tuning with explicit realism constraints. Across highway driving and automated parking (AP), results show that posing the decision problem at the maneuver level reduces unsafe exploration, improves learning efficiency, and yields more interpretable policies than flat low-level control. In low-speed maneuvering, staged curricula mitigate common local minima and enable stable convergence, while a transfer study in CARLA confirms that the training logic generalizes to higher-fidelity simulation when timing and perception are tuned appropriately. To quantify generalization beyond single scenarios, the thesis introduces a unified CARLA parking benchmark that exposes the performance impact of operational design domain (ODD) shifts across perpendicular, skewed, and parallel geometries and supports targeted fine-tuning for ODD expansion. For robustness under strategic interactions, the thesis studies attacker--victim training in AP and proposes Regularized Adversarial Fine-Tuning (RAFT) for CARLA parking. RAFT addresses a key failure mode of naive zero-sum formulations, namely degenerate collision-seeking or region-blocking opponents, by regularizing the adversarial objective to favor challenging yet feasible interactions. Empirically, RAFT preserves baseline success in the static parking ODD while improving trajectory-quality indicators such as alignment error and reverse actions, without catastrophic forgetting. Taken together, the presented methods and the accompanying environments, interfaces, and benchmarks provide a practical basis to train, evaluate, and harden DRL policies for AD beyond nominal conditions. They also make it easier for other researchers to reproduce results and to build on this work, including automated curriculum design, multi-agent robustness with data-driven behavior priors, tighter integration with safety guarantees and verification, and sim-to-real validation.
Deep Reinforcement Learning Agents for Robust Decision Making in Automated Driving
PIGHETTI, ALESSANDRO
2026-05-27
Abstract
Robust decision making is a central bottleneck for automated driving (AD). Even rare policy failures can translate into safety-critical outcomes, and therefore learning-based approaches must deliver not only high nominal performance but also stability and reliability under perturbations. Deep reinforcement learning (DRL) provides a principled framework to learn driving policies from interaction, yet state-of-the-art methods still face recurring gaps in practice. They are often brittle under distribution shift, hard to train reproducibly in safety-constrained tasks, and vulnerable to atypical or adversarial behaviors by other agents. This thesis advances the state of the art by introducing and validating a coherent training stack for robust DRL-based decision making that treats stability and safety as design objectives. The core claim is that robustness emerges from structured problem formulation, training processes that control difficulty over time, systematic evaluation under operational design domain (ODD) shifts, and systematic exposure to failure modes through adversarial interactions. Concretely, the proposed pipeline integrates maneuver-level action abstraction, curriculum learning (CL), scenario diversification and benchmarking for ODD coverage, and adversarial fine-tuning with explicit realism constraints. Across highway driving and automated parking (AP), results show that posing the decision problem at the maneuver level reduces unsafe exploration, improves learning efficiency, and yields more interpretable policies than flat low-level control. In low-speed maneuvering, staged curricula mitigate common local minima and enable stable convergence, while a transfer study in CARLA confirms that the training logic generalizes to higher-fidelity simulation when timing and perception are tuned appropriately. To quantify generalization beyond single scenarios, the thesis introduces a unified CARLA parking benchmark that exposes the performance impact of operational design domain (ODD) shifts across perpendicular, skewed, and parallel geometries and supports targeted fine-tuning for ODD expansion. For robustness under strategic interactions, the thesis studies attacker--victim training in AP and proposes Regularized Adversarial Fine-Tuning (RAFT) for CARLA parking. RAFT addresses a key failure mode of naive zero-sum formulations, namely degenerate collision-seeking or region-blocking opponents, by regularizing the adversarial objective to favor challenging yet feasible interactions. Empirically, RAFT preserves baseline success in the static parking ODD while improving trajectory-quality indicators such as alignment error and reverse actions, without catastrophic forgetting. Taken together, the presented methods and the accompanying environments, interfaces, and benchmarks provide a practical basis to train, evaluate, and harden DRL policies for AD beyond nominal conditions. They also make it easier for other researchers to reproduce results and to build on this work, including automated curriculum design, multi-agent robustness with data-driven behavior priors, tighter integration with safety guarantees and verification, and sim-to-real validation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



