The Impact of Flaky Tests on Software Quality and the Ways to Reduce It
Flaky tests are automatically performed tests, which do not always pass or fail at all, regardless that their code is stable and was not changed. These tests are unpredictable. That is why, they pass or do not pass by occasion, despite they are made in equal conditions.
These flaky tests are usually problematic as they can cause false positive results (when the passed test is false in fact) and false negative results (when the test which is not passed is true in fact). As the result, the developers may lose their time and force.
To avoid test flakiness, the developers should implement various measures, such as insolation of the tested environment, repeating attempt mechanisms, increasing of waiting time or log analyses and indicators to detect main problems.
What Does the Flaky Test Mean?
A flaky test is a kind of test that gives unreliable and contradictory results, which leads to its failure and contradictory passing. In other words, it may be possible when the unreliable test fails or passes unexpectedly, even if the tested code was not changed.
The test is flaky for various reasons, including environmental problems, timing problems, hurry-up conditions, or a problem with implementing tests. For example, a test, which depends on an external resource, which may not be always accessible, may be flaky as it may pass or fail incoherently, depending on resource availability.
Examples of Flaky Tests
Here are some typical occasions considers, each of them can be used as a flaky test example:
The tests which depend on the network: if the automatically made test depends on the network connection, for example, a test checking data got from an external API, it becomes flaky when the network connection is unstable or too slow.
Tests dependent on time, which are based on certain parameters of time, such as timeouts or waiting periods, may become flaky if there is a small delay in the tested system. For example, a test checking the webpage response time sometimes passes or fails. It depends on the load of the server or network.
Tests related to concurrency, which are performed simultaneously and can interact with each other causing flaky behavior. For example, a test that records the database in the same table as another test can sometimes finish with an error, depending on the test performing order.
Tests that depend on the environment. Such as availability of certain resources can become flaky when the environment changes. A good flaky test example, in this case, is a test checking a file presence in a file system, which can end with an error, if a file is removed by another process.
Main Causes of Flaky Tests
There are different reasons for test flakiness, including issues with the environment, such as network connection or server productivity, synchronization problems, such as race conditions or timeouts, or problems with the code (problems with concurrency or incorrectly processed exceptions).
Here are some common reasons for flaky tests:
- Problems with time
- Conditions of race
- Environment problems
- Problems with the implementation of tests
- Dependencies on external resources
- Incomplete coverage of tests
How to Detect Flaky Tests?
Detecting flaky tests can be problematic because of the test flakiness diversity, time problems, conditions of race, and unreliable data of tests.
While detecting flaky tests some common problems may arise, such as test may not appear immediately or mistakes, caused by false positives or false negatives, unreliable environment, or problems with test design.
Here are Some Flaky Test Detection Ways:
To analyze the results of tests and realize which tests do not correspond each other. It will help to reveal patterns pointing to the instability:
To monitor test performing to find out inconsistent tests by performing one test several times and comparing results.
To record the time of test performing for each test and then compare it with the average time of execution. If a test takes considerably more time, than the average time of execution, it can point to the fact that this test is flaky.
To use the tools for code analysis, such as SonarQube or, for example Code Climate to detect potentially bad quality tests. These tools can detect even smells of a code, coverage of tests and other flakiness signs.
To make tests running in parallel which can help to detect tests, which are not working. If a test fails in an inconsistent manner, its performance together with other tests will help to detect the cause of such failure.
To track the dependencies of tests, as tests depending on external resources or services can be unstable. You should check the availability and agreement of these resources to make potentially flaky test detection possible.
To view a code of tests for revealing potential causes of such test flakiness, including synchronization of threads, sleep statement usage and conditions of race.
In general, to detect flaky tests you need to combine monitoring, analysis, and code checking. If you reveal and remove the test flakiness in time, it will allow you to increase the testing accuracy and reduce the flaky testing.
How to Fight With Flaky Tests?
Fighting with test flakiness requires different techniques and process approaches, which can be combined in practice. There are some useful strategies that can work:
Flaky test identification and prioritization. It is necessary to identify test, which is not working, and prioritize them depending on their impact on the software quality and development time. Prioritizing failed tests can help developers to focus on the most crucial issues.
Correction of flaky tests. After developers found unstable tests, they need to remove them and their initial cause. It may include the refactoring of a testing code, corrections of race conditions, improvement of the mechanisms of synchronization as well as reduction of the dependency on external resources.
Test automation. It can reduce the probability of bad quality tests ensuring more consequent and reliable results. Automation will also help to reveal bad-quality tests more quickly and precisely.
To perform tests in parallel. Launching tests in parallel helps to reveal flaky tests by starting them in different environments or at different times. It helps to detect the problems with surrounding and time, which can be the reason for the test flakiness.
Isolate tests can help to reduce the impact of unstable tests by isolating them from other ones. It may prevent flaky tests from leading to the failure of other tests.
Test results monitoring. It is recommended to track the results of testing to detect bad quality tests and their frequency or impact on the quality of the software. It helps to reveal trends and patterns, which can point to the test flakiness.
Improvement of testing coverage, which can help in the reduction of bad quality test probability due to more full testing of the software. It can help to reveal and remove problems before they become bad quality tests.
Here is the table, which summarizes the flaky test reasons, their consequences, and recommended remedies:
Causes
Problems with time
Conditions of race
Problems with environment
Problems in test performing
Dependence on external resources
Not full coverage of testing
Results
Tests can pass or fail in an inconsistent manner because there are some timing factors, for example, the network delay, delay in input or output, and waiting time.
Tests pass or fail in an insequent manner if ordering of parallel threads and processed are unpredictable.
Tests can pass or fail in an inconsistent manner as testing environments may be different compared to the production environment, for example, due to the different dependency versions or configuration of the equipment.
Tests can pass or fail in an inconsistent manner because there are problems with the code in the test, such as incorrect affirmation, and improperly cleaned-up tested data which depend on the performance order.
Inconsistent passing or failure of tests may be caused by the dependence on external resources, including databases, services of third parties, or API.
Inconsistent test passing or failure may be caused by the test set does not cover all possible variants of development and edge cases.
Proposed remedies
Using time waiting and repeatable attempts, launching tests in parallel and insulating them using virtual environments, which ensures agreed testing surrounding.
Using synchronizing mechanisms, including blockings, semaphores and barriers which manage access to general sources.
Using imitation objects and plugs to insulate tests from external dependencies by application of virtual environments or containers ensuring an agreed testing environment, and tracking the production environment on differences, which can cause test flakiness.
Use test refactoring for code quality improvement, increasing their reliability and accuracy, ensuring the proper cleaning up of the tested data, and making tests independent on the order of their performance.
Using plugs and layouts for imitation of external resources, using test tweens for external resource imitation, reduction of dependencies on external resources, using a database in the memory, or other alternative remedies.
Improving the coverage of tests for ensuring the coverage of all paths of a code and edge cases, using mutated testing or other measures to detect white spaces in the coverage of tests.
It is worth noting, that all these means do not always work perfectly. The best approach may depend on each case, test, or nature of a test case that failed. In addition, prioritizing and failed test removal are very crucial as they influence the quality of the software and the time of development.
Conclusion
In general, failed tests can impact badly the soft quality, time of development, and issuing cycle. It is very important to reveal and remove them as soon as possible. Fighting with flaky tests requires a very active approach, including detection, prioritizing, and solving problems quickly and effectively. By applying the remedy strategies, developers can increase the reliability and efficiency of their attempts for testing and create better software.