Web applications are critical to modern life and require rigorous End-to-End (E2E) testing to ensure reliability across front-end and back-end components. While recent work has improved E2E testing—reducing cost, flakiness, and increasing robustness—a common benchmark is missing, hindering fair comparison and progress. This work introduces the first E2E benchmark dataset to address that gap: 12 Selenium WebDriver test suites for 8 web applications, packaged in Docker for easy deployment. It supports test evolution, automation, and flakiness studies, offering 389 Gherkin-based test cases, 283 Page Objects, 1,364 locators, and over 19k lines of code. By providing a reproducible, diverse foundation, this benchmark enables consistent evaluation of testing techniques and fosters advancement in E2E testing research.
BEWT: A Benchmark for End-to-End Web Testing
Olianas D.;Leotta M.;Ricca F.
2026-01-01
Abstract
Web applications are critical to modern life and require rigorous End-to-End (E2E) testing to ensure reliability across front-end and back-end components. While recent work has improved E2E testing—reducing cost, flakiness, and increasing robustness—a common benchmark is missing, hindering fair comparison and progress. This work introduces the first E2E benchmark dataset to address that gap: 12 Selenium WebDriver test suites for 8 web applications, packaged in Docker for easy deployment. It supports test evolution, automation, and flakiness studies, offering 389 Gherkin-based test cases, 283 Page Objects, 1,364 locators, and over 19k lines of code. By providing a reproducible, diverse foundation, this benchmark enables consistent evaluation of testing techniques and fosters advancement in E2E testing research.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



