Seminars & Colloquia
Marcelo d'Amorim
Federal University of Pernambuco (UFPE)
"Using Noise to Detect Test Flakiness"
Tuesday March 22, 2022 01:15 PM
Location: 3211, EB2 NCSU Centennial Campus
Google/Zoom Meeting Info (Visitor parking instructions)
In this talk, I will present my work on flaky test detection. Prior studies have shown that concurrent behavior is the most common cause of test flakiness. Based on that observation, we hypothesize that adding noise in the environment can interfere in the ordering of program events and, consequently, can influence the test outputs. We propose Shaker, a practical technique to detect flaky tests. Shaker detects flakiness by comparing the outputs of tests executing in carefully selected 'noisy' environments. Compared with a regular test run, one test run in Shaker is slower as Shaker executes the tests in loaded environments, i.e., the process that runs a test competes for resources (e.g., memory or CPU) with stressor tasks that Shaker creates. However, we conjecture that Shaker pays off by detecting test flakiness in fewer runs compared with the alternative of running the test suite multiple times in a regular noiseless environment. We refer to that alternative as ReRun.
We evaluated Shaker on a public benchmark of flaky tests for Android applications using standard performance metrics (e.g., precision and recall) and ReRun as a comparison baseline. Results are encouraging. For example, we found that (1) Shaker is 98% precise; it is almost as precise as ReRun, which, by definition, does not report false positives, that (2) Shaker’s recall is much higher compared to ReRun’s (95% versus 65%), and that (3) Shaker detects flaky tests much more efficiently than ReRun, despite the execution overhead associated with the introduction of noise.
In the future, I plan to evaluate other mechanisms to introduce noise in the environment (e.g., resource throttling, test-specific noise generators) and to explore the idea of selectively introducing noise to debug flaky tests (i.e., to explain to the developer why a test is flaky). Shaker paved the way for those ideas.
Host: Kathryn Stolee, CSC