Extending and Implementing Process Mining Techniques

Objectives

The project aims to improve the effectiveness of process mining by detecting and repairing imperfections in event logs. Using a patterns-based approach, the project systematically addresses common data quality issues such as timestamp errors, misordered events, and duplication. This enhancement ensures more accurate process mining analyses and helps organizations in sectors like healthcare to monitor, analyze, and improve their internal processes efficiently.

The focus is on creating algorithms and solutions to detect, classify, and repair data imperfections in event logs, ultimately leading to higher quality data and more reliable process mining results. The project utilizes a variety of techniques from process mining, data mining, machine learning, gamification, artificial intelligence, and statistics to develop detection and repair solutions.

Specifically, the project involves the creation of new algorithms, with examples including the use of Large Language Models (LLMs) to enhance detection and repair processes. These advanced techniques ensure efficient and effective improvements in event log data quality, allowing for robust process mining insights.

Implemented Algorithms

Unanchored Event Detection and Repair

The Unanchored Event pattern arises when the timestamp format in the event log differs from the tool's expected format, leading to incorrect event ordering. Common issues include confusion between day-month and month-day formats, or inconsistent symbols and timezones.

Algorithms for Timestamp Analysis and Repair

The project developed a series of algorithms to analyze and repair timestamp values by detecting deviations from the expected standard. The key steps are:

  1. Timestamp Format Identification:
    • Identify the expected timestamp format (e.g., YYYY-MM-DD HH:MM:SS).
    • Develop a parser to recognize timestamps in the expected format.
    • Validate parsed timestamps to ensure they adhere to the expected format.
  2. Handling Format Variations:
    • Account for variations like month-day vs. day-month formats.
    • Handle different symbols for time separation (e.g., colon ":" vs. dot ".").
    • Manage variations in timezone encoding and offsets.
  3. Anomaly Detection and Flagging:
    • Detect anomalies such as ambiguous date formats or misinterpreted timestamp values.
    • Implement logic to flag potential errors in the timestamp.
    • Ensure that all timestamp values fall within expected ranges to avoid time-traveling errors.

Once these algorithms identify and flag issues in the timestamps, they initiate a repair process that corrects the timestamps, ensuring consistent event ordering.

Form-Based Pattern Detection and Repair

The Form-Based Pattern occurs when multiple events are logged from a single form as separate parallel events. This issue causes flattened event orders, unnecessary duplications, and results in overly complex process models. The project implements two distinct approaches to detect and repair this issue:

Repairing Approaches

Repairing Approach 1: Manual Aggregation Method

This approach involves manually detecting and aggregating events that have the same timestamps and case IDs.

Pros: Provides a manual, user-driven process for greater control over event aggregation.

Cons:

Repairing Approach 2: Machine Learning-Based Method

This approach utilizes machine learning models to automatically detect and aggregate events based on their timestamps and case IDs.

Pros:

Cons: Requires a well-trained AI model, which may require upfront setup and tuning costs.

Inadvertent Time Travel Detection and Repair

Inadvertent Time Travel occurs when timestamps are recorded incorrectly, leading to misordering of events in the process log. This pattern is typically caused by human errors, such as incorrect timestamps near midnight or adjacent key presses. Two distinct experiments were conducted to detect and repair these issues:

Experiment 01: Rule-Based Approach

This approach relies on manually defined rules that are applied during program execution to all event logs.

Rules Used:

Pros: Simple and easy to implement for rule-specific scenarios.

Cons: Limited in scope since it requires manual input of rules and is not fully automated.

Experiment 02: LLM-Based Approach

This approach leverages a Large Language Model (LLM) to provide a more intelligent and automated solution for detecting and repairing Inadvertent Time Travel patterns.

Pros: Fully automated, can adapt to various cases without needing predefined rules, making it more generalizable.

Cons: The LLM-based approach is more costly due to the need for model fine-tuning and runtime usage costs.

Comparison of Experiments

Both the rule-based and LLM-based approaches show similar results in terms of detection and repair accuracy. However, there are significant differences:

Comparison of Detection and Repair Approaches: The rule-based approach and LLM-based approach showed similar accuracy. However, the LLM-based approach is fully automated but more costly due to the need for model fine-tuning.