Critical errors that become evident only in live operation mean negative publicity for both the product and the companies involved. To prevent this, test automation is a fundamental, integral part of modern software development. However, the technical implementation with test automation tools gives rise to problems that we need to be aware of.
Only high test coverage and prompt feedback of the results allows for the quality and maturity of the product to be appropriately documented and confirmed. The parties involved use various testing methods and types of tests, such as automated code analysis or automated unit tests by the developers, and automated interface tests by the testers. From an early stage, attempts were made to allocate the various types of tests to general categories, such as the differentiation between black-box and white-box tests.
Laut dem German Testing Board (archived version 2.1 of the ISTQB® GTB Glossary) black-box testing means “functional or non-functional testing without using information about the inner workings of a system or a component”, and white-box testing means a “test based on the analysis of the internal structure of a component or system”. Up until a few years ago, black- and white-box testing methods were all but synonymous with two other categories: the dynamic test, i.e. “testing of the test object by running it on a computer”, and the static test, i.e. “testing of software development artifacts, e.g. requests or source text, without running them, e.g. by way of reviews or a static analysis”. This distinction is no longer possible today because unit tests or testing methods such as test-driven development (TDD) have dissolved the original lines between white- and black-box tests. I call this new area “gray-box test”. Gray-box tests try to optimally combine the desirable benefits of black-box tests (specification-driven) and white-box tests (developer-driven) while eliminating the unwanted disadvantages as far as possible.
The advantage: sub-components and entire systems can be tested with the low administrative effort involved in white-box tests, but without possibly “circumventing errors” in the tests. In TDD, for example, the component tests are prepared based on the specifications prior to the actual development of the code. The development of the components is finished only after all the test routines have successfully been run. Besides the benefits, however, there are some important aspects that need to be considered. TDD or gray-box tests require a high level of discipline in order for them to be used effectively and in a practical manner. But what is even more important is the fact that gray-box tests should not indiscriminately be considered to be an adequate replacement of black-box tests.
Why should you not rely on automated gray-box tests exclusively?
Gray-box tests affect and change the system they are supposed to test. This aspect results from the very nature of the test. What is a test, really? It is basically empirical evidence. We propose a hypothesis and then test it in an experiment. And the same rule that applies for physical experiments is also true for software tests: The closer I get to the test object, the more this can influence the result of the test. Black-box tests are run in their own test environments, which should have a structure similar to that of the production environment. Nevertheless, it is still “a test setup”. Mocks are inserted to replace missing components, and the log level is increased to gather more data.
Gray-box tests, i.e. code-related tests where the software to be tested is executed partially or in its entirety are not only very close to the test object. With tools such as JUnit or TestFX, we expand the code basis to include new components. New lines of test code are written, and new test frameworks are integrated into the software solutions as a library.
But even with software solutions like QF-Test, Expecco or Squish, which run automated interface tests, we get very close to the object to be tested. In older versions of the automation tools for graphical interfaces, the data were captured by saving the position data of the GUI element, such as a button, and sending a corresponding event at the time of execution. The software then creates a screenshot and compares it with another one created before in order to verify the test results. Which is largely harmless. Modern tools, on the other hand, take a different path. They connect to the application to be tested via their own engine. This enables them to capture all the control elements of the interface, read their properties, and remote-control them. The corresponding data are deposited as a model of the application in so-called GUI maps, and they constitute the basis for the subsequent preparation of test scripts.
This proximity to the software to be tested can have the effect that certain errors occur only because of it, or, even worse, are concealed or do not occur at all. We change the basis of the test with the “complicated” test setup, and we cannot be sure that the software to be tested would in fact have responded the same way if we had tested it “just” manually.
This is why it is important to know the tool and its characteristics, and to be aware of the possibility of errors being masked by the closeness of the code of the tests. If we consider this to be a risk, we should complement our automated tests with other, manual types of tests.