What Is Process Mining And What Are Its Challenges?

Herbert Huffner Cryptocurrency June 14, 2021

What is a mining process?

The mining Process is between computing intelligence and data mining on the one hand and modeling and analyzing organizational processes on the other.

The purpose of the exploration process is to discover, monitor, and improve real processes by extracting knowledge from data stored in information systems.

Process exploration further analyzes processes using incident reporting. Classical data mining techniques such as clustering, classification, association, etc. do not focus on process models and are only used to analyze a specific step in the overall process. Process mining adds a process perspective to data mining.

Process mining techniques use recorded event data to detect, analyze, and improve the process. Each recorded event refers to activity and is related to a process instance.

What are the sub-categories of the Mining process?

Process exploration techniques, based on event data, are divided into three general categories: process discovery techniques, conformance checking techniques, and process enhancement techniques.

The first group, or process detection techniques, receives event data and produces a model without the use of any prior information. The second group is conformity assessment techniques, which examine whether the actual process running in the organization conforms to the discovered model, and vice versa.

Third-class techniques also address whether a process can be upgraded or developed using event data. For example, using a time tag in the recorded data, the model can be developed to show bottlenecks, waiting time for service, and operating time.

Unlike other analytical methods, process mining is process-oriented and not data-driven but related to data mining.

What is the difference between process mining and data mining?

Process mining combines data mining power and process modeling. By automatically generating process models based on event logs, process mining creates live models with high updatability.

Huge amount of data

The mining process has a lot in common with data mining. For example, both face the challenge of processing large volumes of data. IT systems collect a lot of data about the business processes they support. This data is a good representation of what has happened in the real world and can be used to understand and improve the organization.

Unlike data mining, process mining focuses on a process perspective; That is, it looks at a process execution in terms of the number of activities performed.

Most data mining techniques extract patterns in a format such as rules or a decision tree. But the model mining process creates complete processes and then uses them to identify bottlenecks.

In data mining, generalization is very important to prevent data overflow. This means that we want to discard all data that is not compatible with the general rule.

Process research also requires generalization in working with complex processes and understanding the flow of key processes. Also, in most cases, understanding the exceptions is necessary to discover the points of inefficiency and need improvement.

In data mining, models are usually used to predict similar samples in the future.

In fact, there are few data mining and machine learning methods that produce predictions like a black box without being able to go back or explain why. Because current business processes are so complex, accurate predictions are often unrealistic.

The knowledge gained and a deeper insight into the patterns and processes discovered will help alleviate the complexity; Thus, although data mining and process mining have much in common, there are fundamental differences between what they do and where they are used.

Also, in most cases, understanding the exceptions is necessary to discover the points of inefficiency and need improvement. In data mining, models are usually used to predict similar samples in the future.

Also, in most cases, understanding the exceptions is necessary to discover the points of inefficiency and need improvement.

In data mining, models are usually used to predict similar samples in the future. In fact, there are few data mining and machine learning methods that produce predictions like a black box without being able to go back or explain why. Because current business processes are so complex, accurate predictions are often unrealistic.

Because current business processes are so complex, accurate predictions are often unrealistic.

In fact, there are few data mining and machine learning methods that produce predictions like a black box without being able to go back or explain why.

Because current business processes are so complex, accurate predictions are often unrealistic.

Because current business processes are so complex, accurate predictions are often unrealistic. The knowledge gained and a deeper insight into the patterns and processes discovered will help alleviate the complexity; Thus, although data mining and process mining have much in common, there are fundamental differences between what they do and where they are used.

Because current business processes are so complex, accurate predictions are often unrealistic.

What are the challenges of the mining process?

Traditionally, process exploration takes place in an organization. But with the advancement of web service technology, supply chain integration, and cloud computing, scenarios arise in which data from multiple organizations are available for analysis.

In fact, there are two characteristics for exploring inter-organizational processes.

In a collaborative scenario, different organizations all work together to achieve specific goals, and sample processes are ongoing between these organizations. Then in this model, organizations are like pieces of a puzzle.

The overall process is broken down into pieces and distributed among organizations so that each organization can do its job. Analyzing events recorded in just one of these organizations is not enough.

To discover end-to-end processes, the recorded events of different organizations must be merged, which is not an easy task.

The second scenario is that different organizations run the same process while using the same infrastructure.

Take Saleforce.com, for example. This company is responsible for the sales process of other companies.

On the one hand, companies use the infrastructure of this site, and on the other hand, they do not have to follow exactly one definite process (because the system allows them to make specific settings in following the process.

Clearly, analyzing these changes between different organizations is an attractive task Interestingly, these organizations can learn from each other, and service providers may be able to upgrade their services and provide value-added services based on the results of inter-organizational exploration. They have operations.

On the one hand, we are facing an incredible increase in data volume, and on the other hand, processes and information must be properly collected to meet the needs of efficiency, compliance, and service.

Despite the practicality of the mining process, there are still major challenges that need to be addressed.

The first challenge is finding, collecting, integrating, and clearing event data. In current systems, too, a lot of energy must be expended in extracting the appropriate event data to explore the process.

Typically, there are a few issues that need to be addressed. For example, data may be distributed across multiple sources.

This information must be merged. This problem is exacerbated when multiple identifiers are used for different sources. For example, one system uses a person’s name and date of birth to identify individuals, and another system uses a person’s social security number.

Organizational data is often object-oriented rather than process-oriented. Products and containers, for example, can have RFID tags that automatically lead to a record.

To track a customer order, this object-oriented information must be integrated and pre-processed.

Event data may be incomplete. One of the most common problems is that events do not explicitly refer to process examples.

Event data may contain junk information. Outdated data refers to instances that did not follow the general pattern and rarely occurred. The report may contain information with different levels of aggregation.

Hospital log data may indicate a simple blood test or a complex surgical procedure. To solve this problem, better tools and more appropriate methodologies are needed. In addition, as noted earlier, organizations should treat log data as first-class citizens, not as a by-product.

The second major challenge in this area is the use of complex event data with a variety of features.

Log data can have a wide variety of properties. Some log data may be so large that it is difficult to handle, and some may be so small that no reliable results can be obtained from it. Available tools have difficulty dealing with petabyte-sized data.

In addition to the number of stored event records, other features such as the average number of events in each mode, the similarity between modes, the number of unique events, and the number of single routes should also be considered.

For example, consider the L1 data log file with the following specifications: 1000 states, an average of 10 events per state. Suppose the L2 log file contains only 100 states, but each state contains 100 events, and all events follow the same path.

L2 analysis is much more difficult than L1 analysis, even though both files are the same size.

Since log data only contains sample samples, it should not be assumed that they are complete. Text exploration techniques must deal with this imperfection by using the “open world assumption”: the fact that if a phenomenon does not occur does not mean that it cannot occur. This makes it difficult to interact with small, variable-sized log data.

As mentioned earlier, some log files may contain very low abstraction records. Low-level data is not very desirable for stakeholders; Therefore, it is generally attempted to aggregate low-level data together to produce higher-level data.

For example, when the process of diagnosing and treating a group of patients is analyzed, we may no longer be interested in knowing the results of individual tests.

In such cases, organizations need to use trial and error to determine if the data is suitable for process exploration; Therefore, the tools should provide a quick feasibility testing service for a specific database.

The third major challenge in this area is Concept Drift.

The term conceptual drift in the field of process exploration refers to a situation in which the process changes while it is being analyzed.

For example, at the beginning of a log file, two activities may be simultaneous, while later in the log, these two activities are sequential. Processes may change for a variety of reasons.

Often the recorded data is incomplete.

Process models usually have no limit to the unlimited number of process samples (in the case of loops).

On the other hand, some samples have a much lower incidence than others; Therefore, it is a misconception to think that any instance of a possible process is present in the log of the recorded event file.

To show that it is not possible to imagine having complete data in practice, consider a process that includes 10 activities that can be run in parallel.

Also, assume that the logged file contains 10,000 process samples. The number of general states (permutations) in a model with 10 simultaneous activities is 3,628,000 =! 10; Therefore, all of these instances can’t exist in the recorded event file (containing only 10,000).

The presence of noise data (low event data) adds to the complexity.

Building a model for behaviors that occur infrequently (noise data) is very difficult.

In such cases, it is better to use conformity checking to process these behaviors. Noise and imperfection have made process discovery one of the most challenging issues. Balancing the criteria of simplicity, consistency, accuracy, and generality is a challenging task.

For this reason, the most powerful process exploration techniques provide a variety of parameters.

New algorithms are needed to balance these criteria. For this reason, the most powerful process exploration techniques provide a variety of parameters.

New algorithms are needed to balance these criteria.

For this reason, the most powerful process exploration techniques provide a variety of parameters. New algorithms are needed to balance these criteria.

blog posts

What is a mining process?

What are the sub-categories of the Mining process?

What is the difference between process mining and data mining?

Huge amount of data

In data mining, models are usually used to predict similar samples in the future.

Also, in most cases, understanding the exceptions is necessary to discover the points of inefficiency and need improvement.

Because current business processes are so complex, accurate predictions are often unrealistic.

Because current business processes are so complex, accurate predictions are often unrealistic.

What are the challenges of the mining process?

Take Saleforce.com, for example. This company is responsible for the sales process of other companies.

Despite the practicality of the mining process, there are still major challenges that need to be addressed.

To track a customer order, this object-oriented information must be integrated and pre-processed.

The second major challenge in this area is the use of complex event data with a variety of features.

L2 analysis is much more difficult than L1 analysis, even though both files are the same size.

The third major challenge in this area is Concept Drift.

Often the recorded data is incomplete.

The presence of noise data (low event data) adds to the complexity.

New algorithms are needed to balance these criteria.

© 2020 DED9.com

Servives

About

Solutions

Contact