Process Mining In 5 Steps
Automating business processes is crucial for success, and knowing the real processes is essential. Process Mining addresses both factors with surprisingly good quality. But how do you start with Process Mining, which tools are necessary, and how are results generated quickly? We address these questions and thus enable an uncomplicated introduction to Process Mining – because it is possible to achieve surprising results with just a few steps.
Table of Contents
What is Process Mining?
Performance trends in insurance, goods movements in logistics, customer developments in general or the distribution of tasks in project teams. Other and often invisible value-added processes are hidden alongside the established business processes in these cases. Processes that follow at least the intended target processes and sometimes deviate. There are many reasons for deviations – for example, incorrect operation, shortcuts, and non-transparent or hidden process steps. Process Mining (PM) enables these deviations to be shown and causes to be identified. In addition, PM provides key figures on the resources used and activities carried out in the examined processes. The more precisely the processes used are known, the more likely errors will be identified, and potential for optimization will be found. With the transparency gained via key figures, PM enables changes to the process to be forecast. The data sources for PM are versatile – databases, ERP systems, application logs and interfaces of the IT systems used can be used.
How high is the entry hurdle?
Initial insights can be gained with PM in just five steps, and the basis for further iterations and automatic process analysis can be created.
In the first step, it is important to deal with the processes – questions are the way to go. Which product, which activities or which existing business process appears changeable? For which process are different IT systems required? Where could a lack of process knowledge or transparency lead to errors? Perhaps there are already assumptions about potentials without quantifying them concretely. The questions are based on the application and can be varied. For example, how long have goods been in the warehouse or how often are goods stored without quality assurance?
Data science analyses require as detailed as possible specialist knowledge of the business area being examined. The procedure model CRIPS-DM  speaks of business understanding. This step is also indispensable for the PM because processes have to be understood and requirements assessed. The result of this step is an overview of the questions, goals and motivation. These questions cannot be detailed in the first iteration, but they offer clues for further data processing. At the latest, after an initial visualization of the process data and an exchange with the department (step 4), it is worth returning to this step and clarifying the questions.
Understand The Systems Involved
Data, data, data? PM benefits from a large amount of data and versatile data sources. Both the IT systems involved and their data must be identified. It is usually surprising which data can be found not only in the database but also in logs or file storage and offers a high potential for PM. These do not have to be read live from the production system. Historical data from the archive system or a targeted export are sufficient for initial insights and the aha effect.
Only one thing is important: the data must be related. This can be done via customer numbers, addresses, insurance numbers, JIRA tickets, articles or orders.
Relationships need not consistently share the same identifying characteristic. Changing characteristics or combinations is possible and quite common. For example, a customer relationship begins with an offer, which creates a customer, and many different offers follow over time. If everyone responsible is on board and granted data access, things can get started.
Let’s look at our use case: In a logistics center, all variants of the movement of goods from goods receipt to dispatch are sought to minimize the usual challenges such as effort, shrinkage, incorrect operation and delays.
This is where the real work of the PM begins. Above all, the potential for technical or professional misconceptions is high. Fortunately, this phase is very iterative – with many quick insights that are easy to discuss. Department and analysts will work well together.
What Is A Process Instance?
Process instances represent a coherent process flow. Each process includes several process steps . An activity, such as accepting the goods in goods receipt or storing them in the pallet warehouse, represents such a process step. The PM relates these activities to one another. The activities can be identified for each process instance using an identifying feature. Such an ID can be the order number, for example. A process instance is ideally self-sufficient and not fully or partially involved in other process instances. In practice, there are special cases in which individual process instances are merged or divided. An example from warehouse logistics: Picking can only take a subset of the items from a storage unit, or a delivered pallet can be divided into several storage units. The storage unit is involved in several picks, as Table 2 shows. The process instances are, therefore, only self-sufficient if one process instance is created per line from Table 2 and the movements from Table 1 are duplicated per process instance.
How is Process Mining done?
Event logs are generated when the data is available as an export or the system access for data extraction is ready. These represent the starting point of PM and include events in the form of activities of the process instances . The effort to generate such an event log varies greatly with the type of data model. The tables are already very well structured in the example of goods movements. This only sometimes has to be the case, but it depends on the existing system and the question being considered. If a workflow management system is used, it is often possible to export an event log directly. If, on the other hand, the data is in different information systems and files, the effort is greater – especially if the times need to be clarified. A supposed event is often made up of several systems, so it is advisable to subdivide it into events for the individual systems.
Most standard tools require at least the following structure of an event log:
- Case ID – The process instance is provided with a unique ID. This is often assigned by the algorithm that creates the event log.
- Activity – The activity that is performed in this process step. For example, picking articles from a storage unit above into an order item.
- Time – The time the event ran.
In addition to these three fields, saving other key figures per activity is advisable. These include, for example, the users, the information system or the crowd. If you want to look at the time that has passed between the individual activities and how long an event lasted, you can work with the start and end times. Which data is considered depends on the questions defined in step 1. This is the data basis for further analysis and is usually presented as a table of structured data – this offers opportunities for descriptive statistics or the use of artificial intelligence.
Results are now showing. The structured process data (the event log) can be analyzed with free applications. The basic functionality of these tools is process discovery, i.e., the detection of process flows in event logs or system data. The graphical representation as a process model provides a common basis for communicating with eye-level domain experts. They give tips for improvements, eliminate possible errors and will certainly be surprised by some details – process sequences you would not have expected – or explain hidden process knowledge. There is usually a lot of feedback for data preparation in this phase. In addition, the analysts are given questions from the departments. There is a cycle of knowledge.
Automate Insight Generation
A dashboard of the most important key figures of the processes is one way to establish the results achieved in the company. Process mining goes much further. The identification of process deviations can be automated by anomaly detection. Another approach is using machine learning to answer forecasts on domain-specific questions. Using Explainable AI (XAI), the model’s decisions can be analyzed and causes identified. Both approaches are based on the data prepared as an event log.
To gain automated knowledge, automating the following preprocessing steps is necessary. The data generated can be used for forecasting and anomaly detection.