Recipes for Test Automation (Part 2) – Data Salad

Testomatoes on data salad with stressing

The test data always pose a particular challenge in manual testing, and even more so with respect to test automation. In most manual tests, the test cases frequently only include general information regarding the test data to be used. This method does not work for test automation.

In my previous post, „Ingredients and appliances for test automation, and who is the chef“, I described the prerequisites for test automation that have to be met in order to successfully implement automated processes. I also mentioned another challenge in this context: The test data. I would like to take a closer look at this issue in this blog post.

What happens if we fail to focus on the test data in the test automation, and we rely on test data that do not take the test automation into account?

Testing can be compared with cooking. The test case is the recipe, and the test data are the ingredients. With manual testing, we follow the recipe/test case and find the ingredients/test data as required. This does not work with automated testing. Here, the test data, or ingredients, have to be provided spot-on, in the exact quantities required. This means that it is not enough to indicate the type and form of the test data: the exact instance of the test date has to be specified in the test script as well.

Furthermore, the test data are used up, or the test data age over the course of the testing. Just like in a restaurant, where you eventually run out of tomatoes or the green salad wilts. So then, where do we get the ingredients we need, and in sufficient quantities.

In projects, I often hear: “We’ll just create a—hopefully anonymized—clone of the production data.” However, such a clone only provides a part of the data required for the test automation. This means that such a clone is quite useful for unchanging master data. But the cook in the kitchen will not always get the same order for the exact same salad. One guest wants olives on their salad, another does not; one wants yogurt dressing, the other wants oil and vinegar. Depending on the ingredients and changes to the order, the price changes as well. This means that we also need dynamic data for our test data, so-called transactional data, to comprehensively represent the business processes. There are two approaches to providing the necessary dynamic data for the test automation, each of which has its advantages and disadvantages. With the first approach, the required data are picked from the anonymized clone of the production data. However, the effort required to determine the respective filter criteria and then formulate a corresponding query can quickly become very high. The disadvantage of this approach is the fact that the quantity of the filtered data is limited, and the data can be used up during testing.

With the second approach, the required test data are newly created in the database, ensuring that all the test data required for the test case are provided. Although these test data are used up as well, they can be recreated time and again by the automated scripts. Creating the so-called synthetic test data can be arduous as well, e.g. when there are large amounts of dependent data. Therefore, the decision of which approach is to be used for which test case has to be evaluated for each individual case.

Furthermore, the test data often have to be dynamized in the test automation. What does that mean? Let us look at another example from the kitchen. Everyone knows that a good Peking duck should be ordered 24 hours before it is to be eaten. As the date of consumption is a variable date, dynamization can offer a solution for the automation. The result then looks like this: Date of consumption = order date + 24 hours. This and similar kinds of dynamic test data are also used in test automation.

Conclusion: In most cases, the solution is like a good salad—it’s all in the mix. Let us once more summarize the recipe: Take an anonymized clone of the production data for the basic, unchanging master data (you can also create smaller amounts of data yourself), and add some well-selected dynamic data from the clone of the production data by means of a query in the test case. Now add a well-selected dose of synthetic test data. All of this should be well balanced to match the test case template. Lastly, top the whole thing off with a splash of dynamized test data in the template, and you have the perfect data salad for your test automation. After serving the test data to the database, it is important to thoroughly clean up the data kitchen. That means returning all the static, synthetic and dynamic test data to their original state if they were modified by the test case. Simply put, every test case necessarily entails a clean-up of the test data because tidiness is the top priority in a good data kitchen.

As we all know, various appliances are also used in the kitchen to automate the cooking process. My next blog post in this series on recipes for test automation addresses the question how much test automation is good for a project. Until then: Happy testing, and keep your data kitchen neat and tidy.

This post was written by: