Extending and Implementing Process Mining Techniques

Objectives

The project aims to improve the effectiveness of process mining by detecting and repairing imperfections in event logs. Using a patterns-based approach, the project systematically addresses common data quality issues such as timestamp errors, misordered events, and duplication. This enhancement ensures more accurate process mining analyses and helps organizations in sectors like healthcare to monitor, analyze, and improve their internal processes efficiently.

The focus is on creating algorithms and solutions to detect, classify, and repair data imperfections in event logs, ultimately leading to higher quality data and more reliable process mining results. The project utilizes a variety of techniques from process mining, data mining, machine learning, gamification, artificial intelligence, and statistics to develop detection and repair solutions.

Specifically, the project involves the creation of new algorithms, with examples including the use of Large Language Models (LLMs) to enhance detection and repair processes. These advanced techniques ensure efficient and effective improvements in event log data quality, allowing for robust process mining insights.

Implemented Algorithms

Form-Based Pattern: Addresses issues caused by events logged as different parallel events, flattening event order and causing duplications.
Inadvertent Time Travel: Corrects temporal misordering of events due to erroneous timestamps near midnight or adjacent key presses.
Unanchored Event: Resolves issues in timestamp formats that differ from the tool's expectations, leading to misinterpretation of timestamps.

Unanchored Event Detection and Repair

The Unanchored Event pattern arises when the timestamp format in the event log differs from the tool's expected format, leading to incorrect event ordering. Common issues include confusion between day-month and month-day formats, or inconsistent symbols and timezones.

Algorithms for Timestamp Analysis and Repair

The project developed a series of algorithms to analyze and repair timestamp values by detecting deviations from the expected standard. The key steps are:

Timestamp Format Identification:
- Identify the expected timestamp format (e.g., YYYY-MM-DD HH:MM:SS).
- Develop a parser to recognize timestamps in the expected format.
- Validate parsed timestamps to ensure they adhere to the expected format.
Handling Format Variations:
- Account for variations like month-day vs. day-month formats.
- Handle different symbols for time separation (e.g., colon ":" vs. dot ".").
- Manage variations in timezone encoding and offsets.
Anomaly Detection and Flagging:
- Detect anomalies such as ambiguous date formats or misinterpreted timestamp values.
- Implement logic to flag potential errors in the timestamp.
- Ensure that all timestamp values fall within expected ranges to avoid time-traveling errors.

Once these algorithms identify and flag issues in the timestamps, they initiate a repair process that corrects the timestamps, ensuring consistent event ordering.

Form-Based Pattern Detection and Repair

The Form-Based Pattern occurs when multiple events are logged from a single form as separate parallel events. This issue causes flattened event orders, unnecessary duplications, and results in overly complex process models. The project implements two distinct approaches to detect and repair this issue:

Repairing Approaches

Repairing Approach 1: Manual Aggregation Method

This approach involves manually detecting and aggregating events that have the same timestamps and case IDs.

Detection: The system identifies events with the same timestamps and case IDs.
User Interaction: The system prompts users to determine if the events should be aggregated into a single form-based event by providing a single event topic.
Aggregation: Using the provided data, the system merges the detected form-based events into a single event.

Pros: Provides a manual, user-driven process for greater control over event aggregation.

Cons:

Slow and inefficient due to the manual nature of the process.
Not suitable for every organization.
Time-consuming, particularly for large event logs.

Repairing Approach 2: Machine Learning-Based Method

This approach utilizes machine learning models to automatically detect and aggregate events based on their timestamps and case IDs.

Detection: The system identifies events with the same timestamps and case IDs.
Inference: An AI model is used to infer relationships between the detected events and determine whether they belong to a single form.
Aggregation: The system aggregates detected events into a single event based on the AI model’s inferences and adds the new aggregated events in place of the original detected events.

Pros:

Fast and efficient, saving time and resources.
Can be applied to any organization’s event logs, regardless of complexity or size.

Cons: Requires a well-trained AI model, which may require upfront setup and tuning costs.

Inadvertent Time Travel Detection and Repair

Inadvertent Time Travel occurs when timestamps are recorded incorrectly, leading to misordering of events in the process log. This pattern is typically caused by human errors, such as incorrect timestamps near midnight or adjacent key presses. Two distinct experiments were conducted to detect and repair these issues:

Experiment 01: Rule-Based Approach

This approach relies on manually defined rules that are applied during program execution to all event logs.

Detection: The system checks if any event logs violate predefined rules.
Repair: If violations are detected, the system reorders the events so they comply with the rules.

Rules Used:

Rules for re-ordering events based on temporal inconsistencies.
Rules to correct erroneous timestamps.

Pros: Simple and easy to implement for rule-specific scenarios.

Cons: Limited in scope since it requires manual input of rules and is not fully automated.

Experiment 02: LLM-Based Approach

This approach leverages a Large Language Model (LLM) to provide a more intelligent and automated solution for detecting and repairing Inadvertent Time Travel patterns.

Detection: The system sends one event log at a time to the LLM and asks it to evaluate whether the event log is in the correct order.
Repair: If the log is not in the correct order, the LLM will automatically return the events in the proper sequence.

Pros: Fully automated, can adapt to various cases without needing predefined rules, making it more generalizable.

Cons: The LLM-based approach is more costly due to the need for model fine-tuning and runtime usage costs.

Comparison of Experiments

Both the rule-based and LLM-based approaches show similar results in terms of detection and repair accuracy. However, there are significant differences:

Rule-Based Approach: Requires manual rule input, is less automated, but is cost-effective for smaller use cases.
LLM-Based Approach: Fully automated and capable of handling more complex scenarios, but incurs higher costs for model training and usage.

Comparison of Detection and Repair Approaches: The rule-based approach and LLM-based approach showed similar accuracy. However, the LLM-based approach is fully automated but more costly due to the need for model fine-tuning.

Browse

Project Information