Central data platform: The key to future-proof production processes

Modern industrial production requires an expansion and digitalization of classical production flow control when it comes to smart approaches and networking. To do so, we must connect and integrate all levels of the automation pyramid with the aid of digital solutions and data processing systems.

Currently, there are numerous new technologies, such as artificial intelligence, digital twins and augmented reality, with a steadily growing significance for the smart production of the future. In order to use these innovative methods, they need to be linked to existing systems, albeit this has only been possible to a limited extent thus far. For example, there is no standardized approach to providing data for the use of artificial intelligence or for creating digital twins yet. Novel use cases, such as predictive maintenance, also require individual access to the required data.
New technologies and their application can only be successfully implemented by close cooperation between departments and with a clear integration strategy.

Feasibility in brownfield

Figure 1: Access and provision of data in a new, third dimension

Most digital transformation projects take place in brownfield production environments. This means that the production facilities are already in operation and, from an economic point of view, there is a need to find solutions that can be integrated with the existing machinery and software systems.

The development towards smart production requires new exchange channels that open up a third dimension of data flow and facilitate existing data to be made available centrally. It is economically inefficient to implement these new channels in every new project. Consequently, a generic approach should be taken, in which data is obtained from the respective production systems and made homogeneously accessible regardless of the individual use case. A central data platform, on which all existing production information is made accessible, is the basis of a flexible and scalable path for further development and optimization of production processes.

Advantages of a central data platform

  • Democratic provision of existing machine data from the brownfield
  • Fast integration of new technologies
  • Implementation of innovative use cases
  • Simple implementation of data transparency and data governance
  • Access to historical and real-time data
  • Scalable application development
  • Increased efficiency and quality in data transfer

Challenges of data processing

There are defined interfaces for individual systems available that allow data to be easily read, e.g. from ERP or MES systems. The situation is different with a SCADA system, however, since it has a very heterogeneous and domain-specific build. Its interfaces and those of the subordinate machine control systems (PLC) are not uniformly defined. There are also no uniform industry access standards for direct access at sensor level, since this kind of use case has not yet been addressed by machine or sensor manufacturers. However, it is worthwhile to have direct access to the sensor system, as sensors deliver much valuable data beyond their actual functions, usually without exploiting the data.

Use case

Our example shows a classical inductive sensor, where usually only the “sensor on/off” signal is used. The following functions are already implemented at the factory and could also be evaluated:
  • Switch mode
  • Switching cycle counter, reset counter
  • Operating hours counter
  • Absorption (analog measurement of the electric field)
  • Indoor temperature
  • Device information
  • Application-specific identifier, system identifier, location code

Regardless of the existing exchange channel, there are challenges related to data at all levels. Examples include data protection, the creation of data silos, the processing of mass data, the interaction between humans and machines, and absent or non-standardized communication channels.

Given the highly heterogeneous infrastructure in production, solutions that are individually adapted to the existing conditions can provide a remedy and address the specific challenges at the individual levels. Data governance and data security as well as cybersecurity are taken into account.

Figure 2: Challenges in technology and communication

Holistic approach

Figure 3: Integration of all production-relevant data

It is not only production flow control data that is relevant for optimal linkage of information and the associated benefits, such as efficient use of resources, increased productivity and quality. Companies have to take a large number of parameters into account, including: Deployment and maintenance planning, warehousing, availability of personnel and much more. The logical linking of this data can usually only be done manually. This information being available in digital form could save much time.


Due to the complexity of this topic and the strongly differing requirements of individual production environments, it is clear that standard solutions are insufficient for paving the way to unrestricted data availability, and thus to new technologies. It is therefore important to start by looking at the use cases that create the most added value.

The vision for more efficiency, flexibility and quality

Comprehensive plant-to-plant communication aimed at improving production processes and identifying the causes underlying quality issues can be realized using a central data platform. It allows the data provided to be exchanged across facilities via standardized interfaces. This fully automated exchange of information has many benefits for production. Production planning and control can react flexibly to information from suppliers and customers. Real-time data allow bottlenecks and problems to be identified and rectified more quickly. Quality deviations can also be traced back to their cause across facilities and recurring problems can be avoided through early anomaly detection. The exchange of data also reduces transportation and logistics costs. Moreover, direct communication between the facilities improves cooperation: The exchange of knowledge and experience can give rise to new ideas and innovations that further improve production.

Cross-factory communication in semiconductor and automotive production
Figure 4: Data transparency in wafer production from front-end to back-end (Semiconductor)
Figure 5: Data integration for more efficient and future-proof communications between suppliers and manufacturers (Automotive)

Maturity of the data platform

As described above, the availability of data is the foundation for future technologies. This access if provided by a central data platform. The real added value is created when the collected data can be used and put to good use in production. For this purpose, applications must be linked to the respective platform.

One future scenario describes a standardized data storage system that is accessed by all applications across different productions. By using the data platform, the applications can exchange data, thus rendering other storage locations obsolete.


With regard to the decision about transformation to a central data platform, we recommend taking an iterative approach and keep developing communication channels and systems at an appropriate pace. The advantage of customized software development is that it evolves in line with the requirements and needs of the company in question, always maintaining the necessary balance between evolution and revolution. In the first step, we therefore usually start with data engineering. However, we also consider future use cases in our architecture and take these into account in the continued development.

Conclusion

Merging data from different layers of the automation pyramid and other data silos onto a homogeneous platform allows companies to fully democratize and transform their data. Data management rules help to ensure the quality and security of the data. A cloud-based approach offers many benefits, such as scalability and flexibility. The utilization of a central data platform lets companies use their data more effectively and exploit the data’s full potential.

More information in our white paper: Industrial Data Platform

A New AI Approach to Modelling Uncertain Knowledge in Medicine

In his research, Daniel Kahnemann, Nobel Prize laureate for economics in 2002, documented systematic errors in human thought and attributed them not to distortion by emotions, but to the construction of our cognition. In this context, the question of how AI-based systems can compensate for the deficiencies of human reasoning when dealing with statistical or incomplete data is of interest. This is particularly true in areas where wrong decisions have far-reaching consequences, such as medicine.

Modelling application examples

As part of my thesis in computer science, we modelled application examples for the processing of uncertain knowledge from the biomedical field using qualitativeconditionals. With the help of qualitative conditionals, rules of the form “If A, then mostly B” can be encoded as (B|A). Such rules express a plausible, but not necessarily certain, relationship between A and B.

The knowledge we use to draw conclusions is represented by a knowledge base. This consists of conditionals. For example, the knowledge base of Rbirds = { (f|b), (b|p) , (¬f|p) , (w|b) } should represent our knowledge of penguins (p), birds (b), the property of having wings (w) and the ability to fly (f). We know that: (f|b) – birds fly, (b|p) – penguins are birds, (¬f|p) – penguins do not fly and (w|b) – birds have wings. The problem with this knowledge base from the point of view of a classical logic is, that it allows to draw contradictory conclusions: On the one hand, penguins being birds can fly, at the same time, they can’t.

The ranking functions provide a remedy here. They make it possible to assess different scenarios on the basis of available knowledge according to their plausibility. In addition, it is possible – contrary to classical logic – to revise conclusions already drawn. Either because there are certain exceptions to the rules, or because the knowledge changes. Penalty points are assigned to each of the conditionals, which express how much it costs if a scenario violates the conditional. For our knowledge base Rbirds , for example, the scenario pbfw (A world in which penguins are birds, have wings and fly) violates the second rule (¬f|p) and is therefore less plausible than the scenario pb¬fw (A world where penguins are birds, have wings and do not fly).

Schematic representation for an example of reasoning with qualitative conditional logic
Figure 1: An example of reasoning with qualitative conditional logic. k(w) is a function that assigns a rank to a scenario. When calculating the rank of a scenario, the penalty points of the rules that are violated by the scenario are summed up. k(pf) e. g. represents a function value of all scenarios in which the variables p and f are true. On the other hand, R0 und R1 divide a logically contradictory knowledge base into two non-contradictory parts. In this way revisable reasoning becomes mathematically possible.

Conclusion of the intelligent agent

Our intelligent agent was able to follow three strategies in drawing conclusions based on different mathematical systems whose labels have evolved historically:

  • In the case of System P, conclusions are drawn over all the rankings allowed for the knowledge base R. It therefore draws its conclusions very cautiously.
  • In System Z, the ranking of a scenario is only affected by the value of the highest exceptional rule violated by that scenario. It does not allow the inheritance of the properties of a superior class in subclasses that have exceptional properties.
  • In general C-Representations, the rank of a scenario is influenced by all the rules that are violated. The minimum C-representations, on the other hand, provide the minimum implausibility levels of the scenarios, whereby three different minimality measures (cw, sum, ind) exist. Furthermore, one can draw conclusions using either skeptical, weaklyskeptical or credulous type of reasoning.

Here is an application example: Malaria tropica is a widespread and life-threatening disease in sub-Saharan Africa. It is caused by an infection with a single-celled parasite, Plasmodiumfalciparum. The malaria pathogen is transmitted by Anopheles mosquitoes. But not everyone infected with P. falciparum gets malaria: Some humans carry a hereditary form of the haemoglobin gene called sickle cell allele. Two copies of this defect gene cause a malformation of the red blood cells and sickle cell anaemia in humans, whereas one copy does not affect the function of red blood cells. However, humans carrying one copy of the sickle cell gene usually do not get seriously sick with malaria despite being infected with the malaria pathogen, thus having a survival advantage against malaria fatality over humans with normal haemoglobin. Strikingly, the sickle cell gene is relatively common in malaria regions: In Africa, for example, there are areas where almost one third of the population carries the mutated haemoglobin gene. The two most important strategies to prevent malaria are the avoidance of mosquito bites and therefore infections (exposure prophylaxis) and the use of medication to control the spread of the malaria pathogen inside the body after an infection (chemoprophylaxis). Since the malaria pathogen is constantly changing genetically, it is always to be expected that it develops resistance to common drugs.

To model the outcome of a malaria disease in an infected patient h, we used the signature Σ = {h, m, s, p, r} with the following semantics. The variable h represents the patients infected with the malaria pathogen. The variable m is true if the infected patient gets sick with malaria. The variable s expresses that the patient carries a copy of the sickle cell gene. An applied chemoprophylaxis is modelled by p and an infection with a resistant pathogen is modelled by r. The KBmalaria knowledge base contained:

  • (¬s| h) Infected patients usually do not have a sickle cell gene.
  • (m|¬s) A lack of sickle cell gene usually allows to become seriously sick with malaria.
  • (¬m|s) A sickle cell gene usually does not allow to become seriously sick with malaria.
  • (¬m|p) Applying chemoprophylaxis usually does not allow to develop severe malaria.
  • (m|pr) Applying chemoprophylaxis and the pathogen resistance to it usually allows to develop severe malaria.

We have confronted the knowledge base of our agent with i. a. the following questions:

  • (m|h) Do patients, infected with the malaria pathogen, usually get seriously sick with malaria?
  • (m|hpr) Does a patient who got a malaria prophylaxis and is infected with a resistant malaria pathogen usually get seriously sick with malaria?
  • (m|hprs) Does a patient  who got a malaria prophylaxis, is infected with a resistant malaria pathogen and  has a sickle cell gene usually get seriously sick with malaria?
Schematical representation of the answers of the AI agent compared to a human expert in an example case
Figure 2: Answers of the AI agent for the KBmalaria knowledge base compared to a human expert. Above the different rank functions with the types of reasoning skskeptical, wskweaklyskeptical, crcredulous.

The first question is to be answered by the AI agent with a “yes”, since infected people usually do not have a sickle cell gene, and therefore become seriously sick with malaria. The second question should also be answered with a “yes”. The last question is the most interesting one, since it involves balancing two contradictory arguments of the knowledge base: On the one hand, infected people become seriously sick with malaria when they use chemoprophylaxis, and the malaria pathogen is resistant to it. On the other hand, a sickle cell gene protects against a severe course of the disease. Empirically, the second argument weighs more heavily since a sickle cell gene also protects against prophylaxis-resistant malaria pathogens. The comparison between the second and the third question shows that individual systems of the agent allow for revisable reasoning: If we add s to our request, we can no longer reason m.

Conclusion

We were able to show how biomedical knowledge can be expressed using qualitative conditional logic. Here we found that, especially in the case of skeptical and weakly-skeptical reasoning using C-representations, the answers of the intelligent agent matched the answers of a human expert, demonstrating that revisable reasoning is possible for machines. Since the underlying algorithms have to be optimized, the potential of this AI approach will only become apparent in the future. If sufficient large knowledge bases can be processed algorithmically, it is to be assumed that such an AI agent can also provide useful support to human experts.

Systems based on qualitative conditionals permit the revision of conclusions in the case of changing state of knowledge, which is particularly important in the medical field. In contrast to the probability-based results of quantitative modelling, the user receives a clear yes- or no-answer to their question. In comparison to neural networks, the conclusion of such a system can be made more transparent, since rule-based modelling makes it easy to understand how it works.


Literature

1) J. Haldimann, A. Osiak, C. Beierle. Modelling and Reasoning in Biomedical Applications with Qualitative Conditional Logic. 10.1007/978-3-030-58285-2_24. (2020)

2) D. Kahneman. Thinking, fast and slow. New York: Farrar, Straus and Giroux. (2011) 3) J. Pearl, D. Mackenzie. The Book of Why: The New Science of Cause. Basic Books (2018)

3) J. Pearl, D. Mackenzie. The Book of Why: The New Science of Cause. Basic Books (2018)

Teaching a Machine to Test Using AI

These days, we distinguish between two ways of testing: manual testing and automated testing. Automated testing is becoming ever more important. And why would anyone mind? With automated tests, test scenarios can be run more quickly, and the cost of manual testers can be reduced.

The topic of Artificial Intelligence (AI) is being considered more and more frequently in the field of quality assurance today. Is this the end of manual testing?

After all, we are currently developing software that is able to autonomously analyze programs and write corresponding test cases. Furthermore, AI-based test software can cover a much wider range using brute force than a manual tester ever could.

But before we continue to compare manual testing and testing with AI, we should take a look at the operating principles and limitations of AI.

In human beings, logical thinking is stimulated by the neural connections in our brains. The same concept is used in the attempts to develop AI. A neural network is constructed that can evolve on several levels and has several nodes.

Structure of a neural network
Figure 1: Structure of a neural network

As illustrated in the figure above, there are input and output nodes. The input nodes can, for example, be compared to the human eye. They react to a stimulus and process it in a hidden layer using various algorithms. The output nodes reflect the reaction to which the human responds. AI-based software processes information the same way.

The number of input and output nodes has to increase in proportion to the task of the AI-based software. The number of nodes and hidden layers reflects the complexity of the resulting algorithm. Obviously, as the number of nodes increases, the required processing power of the hardware increases as well. All nodes are not necessarily interconnected. You can group certain nodes for certain tasks by creating only a certain number of connections between the nodes. For example, the algorithms for seeing and hearing can be implemented separately at first and connected at a later point. Then, an incoming stimulus will cause a reaction.

Grouping nodes in a neural network
Figure 2: Grouping nodes in a neural network

We have built a neural network. And how does this network work? To put it simply, the input nodes signal “1” when they receive a stimulus and “0” when they do not. Each node multiplies this “1” or “0” by a certain factor. At the end, each output node gets a result, and the output node with the highest value triggers the desired reaction.

But where do the factors for the nodes in the hidden layer come from? This is the point where human involvement remains indispensable. AI can calculate numerous approaches and possibilities, but it does not know what is right or wrong. When a person sees a ball flying towards them, they would immediately react to what their eyes see by raising their arms and catching the ball. AI could of course do this too, but it could also do nothing, dodge, knock the ball away, or react in any number of different ways to the stimulus. AI first needs a person to tell it what reaction is correct in the respective situation. For this purpose, we predefine several situations for the AI-based software and tell it how to react to them. The AI-based software adjusts the factors of its nodes accordingly to develop an algorithm. In the next step, we present situations to the AI-based software that it has to react to on its own, while the human evaluates AI’s response to fine-tune the algorithm. Only then can the AI-based software work autonomously. This approach is called deep learning because it shapes the hidden layers of the neural network.

And this is the key point as to why automated testing cannot replace manual testers completely. The AI-based software must first be adapted to the respective software. The AI-based software itself does not know what the software is for or which reactions are correct.

Manual adjustment of the test program settings
Figure 3: Manual adjustment of the test program settings

First, the operable fields to be used by the AI-based software have to be defined. There is, of course, software that can search for all operable objects on the GUI and use them for testing. But this way, all possible combinations would be run using a brute-force search. Consequently, a tester would have to compile a blacklist and a whitelist. Or you can let the AI-based software run freely for several hours during which every variant of the setting check marks or countless text combinations in the name field are tried out. But focusing on the primary test objectives by means of restrictions would be more efficient. The AI-based software delivers all the results obtained in hours of testing, delivering them in the form of newly written test scenarios with a failed or passed state. Then, a tester is again needed to analyze which tests really constitute a software or user error. Furthermore, the test scenarios created by the AI-based software can then be stored during its autonomous testing and later reused for automated tests.

There are several companies today who offer this kind of AI software, e.g. Eggplant.io, Test.ai or Retest. In addition, almost half of all German software companies continually develop their QA department in the field of AI.

The World Quality Report summarizes the results of a global survey regarding application quality and testing methods. The current issue reports that AI is being used in the field of intelligent automation as the most important tool to improve quality assurance in the next two to three years.

I hope this blog post gives you some insight into testing with AI and shows that automation has been taking a great step forward thanks to AI. Still, that does not mean that manual testers are doomed to become extinct—no one can tell exactly what the future will bring.