These Days, The Data Analysis And Processing Market, Is Very Hot, And Various Tools And Methods Are Provided For Careful Data Analysis.
While most users are familiar with the concept of data mining and text mining, some may not be familiar with the science of process mining.
In this article, we will get to know this concept briefly.
What is process mining?
In data mining, models are usually used to predict similar samples in the future. Because current business processes are so complex, accurate predictions are generally unrealistic. Few data mining and machine learning methods produce predictions like a black box without being able to go back or explain why.
The knowledge gained and deeper insight into the patterns and processes discovered will help to resolve the complexity; So, although data mining and process mining have a lot in common, there are fundamental differences between them in what they do and where they are used. Process mining is an almost emerging science between computational intelligence and data mining and modeling and analysis of the organization’s processes.
Process mining aims to discover, monitor, and improve fundamental processes by extracting knowledge from data stored in information systems. Process analysis deals with the study of processes using incident reports.
Classical data mining techniques, such as clustering, classification, association, etc., do not focus on process models and are only used to analyze a specific step in the overall process. Process mining adds a process perspective to data mining.
Process mining techniques use recorded event data to discover, analyze and improve the process. Each recorded event refers to activity and is associated with a process instance.
How is the mining process done?
Based on event data, process mining methods are classified into three categories: process discovery, conformance checking, and process enhancement.
For example, in the first group, which are process discovery techniques, event data is received, and a model is generated without any prior information. Conformance checking techniques check whether the actual process running in the organization conforms to the discovered model and vice versa. The third category of methods deals with the issue of whether a process can be improved or developed using event data.
For example, by using the time tag in the recorded data, the model can be developed to show the bottlenecks, the waiting time to receive the service, and the throughput time. Unlike other analytical methods, process mining is process-oriented and not data-oriented, but it is related to data mining.
What is the difference between process mining and data mining?
Process mining combines the power of data mining and process modeling; By automatically generating process models based on event logs, process mining creates live models with high update capability. Process mining has many points in common with data mining. One thing they have in common is that both face the challenge of processing large amounts of data.
IT systems collect a lot of data about the business processes they support. These data represent what happened in the real world and can be used to understand and improve the organization.
Unlike data mining, process mining focuses on a process perspective; That is, it looks at a process execution from the perspective of several activities. Most data mining techniques extract patterns in a format such as rules or decision trees. But the process mining model creates complete processes and then uses them to identify bottlenecks.
In data mining, generalization is significant to avoid data overflow. It means we want to discard all data that does not conform to the general rule. In process mining, abstraction is necessary for working with complex processes and understanding the flow of primary functions.
Also, in most cases, it seems necessary to understand the exceptions to discover the points of inefficiency and in need of improvement.
Process mining challenges
Process exploration is an essential tool for modern organizations that need proper management of operational processes. Despite the applicability of the mining process, there are still significant challenges that need to be addressed. On the one hand, we are facing incredible growth in data volume. On the other hand, it must appropriately collect procedures and information to meet the requirements related to efficiency, compliance, and service. These challenges are mentioned below.
In current systems, much energy must spend on extracting relevant event data for process exploration. Usually, several problems need to be solved in this context.
Some of these problems are:
- The data may be distributed over several sources and integrated with This information. This problem becomes more acute when multiple identifiers are used for different resources. For example, one system uses the name and date of birth to identify people, and another uses the person’s social security number.
- Organizational data is more object-oriented and not process-oriented. For example, products and containers can have RFID tags that automatically lead to record keeping. This object-oriented information must be integrated and preprocessed To track a customer’s order.
- Event data may be incomplete. One of the most common problems is that events do not explicitly refer to process instances.
- Event data may contain outliers. Outlier data refers to samples that do not follow a general pattern and rarely occur.
Better tools and more appropriate methodologies are needed to solve this problem. Additionally, as mentioned earlier, organizations should treat log data as first-class citizens, not as a by-product.
The second biggest challenge is using complex event data that has various characteristics. Report data may have many different features. Some log data may be too large to handle, and some may be too small to provide reliable results. Existing tools have difficulty dealing with petabytes of data. In addition to the number of stored event records, there are other features, such as the average number of events per state, the similarity between states, the number of unique circumstances, and the number of single paths that should consider. For example, consider an L1 data log file with the following specifications: 1000 states, with an average of 10 events per state. Suppose the L2 log file contains only 100 forms, but each state contains 100 events and all events follow a single path. L2 analysis is much more complex than L1 analysis, even though both files have the same size. Since the log data only contains examples, it should not assume it is complete. Text mining techniques must deal with this incompleteness by using the “open world assumption”: the fact that a phenomenon does not occur does not mean that it cannot happen. This issue makes it difficult to interact with log data of small size and containing many changes. As mentioned earlier, some log files may have records with a shallow level of abstraction. Low-level data is not desirable for stakeholders; Therefore, it is generally tried to combine low-level data to produce higher-level data. For example, when the process of diagnosis and treatment of a group of patients is analyzed, we are probably no longer interested in knowing the results of individual tests. In such cases, organizations need to use the trial and error method to determine if the data is suitable for exploring the process; Therefore, tools should provide a fast feasibility testing service for a given database.
The next challenge is to balance quality criteria such as consistency, simplicity, accuracy, and generality. Often the recorded data is incomplete. Process models are usually not limited to an unlimited number of process instances (in the case of loops). On the other hand, some examples have fewer occurrences than others; Therefore, it is wrong to think that every instance of a process that can happen is available in the recorded events file. Building a model for rarely occurring behaviors (noisy data) is challenging. In such cases, it is better to use printer checking to process this category of behavior. Noise and incompleteness have made process discovery one of the most challenging issues. Balancing the criteria of simplicity, compatibility, accuracy, and generality is difficult. For this reason, the most powerful process exploration techniques provide a variety of parameters. New algorithms are needed to balance these criteria.
The next challenge is related to the construction of evaluation indicators. Process exploration is an emerging technology. It shows why there is a need for evaluation indicators. For example, dozens of process discovery techniques have been presented, but a detailed report of the quality of these methods is not available. Even though there are many differences in the efficiency and performance of these techniques, their evaluation is a challenging and complex task; Therefore, the need for standardized data and appropriate quality measures is strongly felt. Of course, little work has been done in this field. Among the evaluation criteria, four compatibility standards, simplicity, accuracy, and generality, can be mentioned. Also, the event data recorded on the Farahkawi sites are available. On the one hand, indicators should be based on accurate data. On the other hand, there is a need to produce a hybrid database with unique features.
The next challenge is to improve the representation defaults used in process discovery. A process discovery technique produces a process model using a specific language (BPMN, Petri Net, etc.). However, the visualization of the results must be separate from the visualization used in discovering the process. Whether the target language allows concurrency or not, it can affect the representation of the learned model and the class of models used by the algorithm. Choosing a target language often involves several implicit assumptions. These assumptions limit the search space, and techniques that cannot be represented using the target language will not be discovered. These so-called display defaults used in process discovery should be accompanied by a conscious choice and should not be chosen (only) based on graphical display preferences. For example, consider the figure below.
The next challenge is cross-organizational exploration. Traditionally, process research is conducted in an organization. But with the development of web service technology, supply chain integration, and cloud computing, scenarios arise where data from multiple organizations are available for analysis. There are two characteristics for exploring inter-organizational processes. In the collaborative design, different organizations have cooperated to achieve specific goals, and the sample processes between these organizations are in progress. In this model, organizations are like pieces of a puzzle. The overall process is broken into parts and distributed among organizations so that each organization can perform its task. Analyzing events recorded in only one of these organizations is not enough. The recorded events of different organizations must be integrated, which is not an easy task. To discover end-to-end processes
. The second scenario is that other organizations use the same infrastructure to implement the same strategy. For example, you can consider Saleforce.com. This company is responsible for and manages the sales process of other companies.
The analysis of these changes between different organizations is an exciting task. On the one hand, companies use the infrastructure of this site. On the other hand, they do not have to follow a definite process (because the system allows them to make specific settings in following the procedure. These organizations can learn from each other, and service providers may improve their services and offer value-added services based on the results of inter-organizational explorations. And it is interesting.
At first, process exploration focused on the old data (available in the information systems database). Still, process exploration should not be limited to offline methods with the development of technology and the increase of online processes.
Three operational support types are identified prediction and recommendation. When an instance violates the expected process, it can be detected, and the system can issue a warning.
Used Old data can generate predictive models. For example, it is possible to predict the completion time of a sample and make decisions based on that. Using process exploration methods in the offline model creates new challenges regarding computing power and data quality.