Process mining is a technique in which business processes are extracted from information system event logs and analyzed. It is a business process management practice employed for the purpose of discovering new processes, comparing the existing process with the workflow model, and improving the process. Data mining of event logs can yield valuable information that may not be obtained through other methods.
There are three categories of process mining. The first is the discovery model, so named because it involves discovering previously unknown or undocumented processes. This type of data mining is conducted when there is no existing model for the workflow, or when existing documentation is known to be faulty. The event logs are then mined for information, which is analyzed in order to be able to recreate the process. Documentation is then created for the process, based on the data extracted from the event logs.
The second type of process mining is the conformance model. The name derives from its purpose of checking whether the ongoing workflow conforms with the planned process. The event logs are data mined in order to locate differences between the existing process and the model.
Once such differences have been located, they are analyzed to see if they improved the process. Should such changes prove to be beneficial to the process, the model is then revised to include these deviations. Decisions made at process checkpoints are reviewed as to the information available at each point and the data affecting such decisions. If such changes are disadvantageous, changes may then be made in the existing process to allow it to conform more readily to the model.
The third class of process mining is the extension model. This type of data mining seeks to extend an existing model with an improvement. Data from the event logs is analyzed for possible areas of improvement in the structure of the model. Bottlenecks, for example, can be checked for possible alternative routes in the workflow.
Process mining is not without difficulties. Some tasks are invariably hidden from the event logs and cannot be data mined. These may be reconstructed by means of careful analysis of the viewable tasks, but not always. Conclusions based solely on information pulled from the event logs may therefore be of questionable quality.
Duplicate tasks in the events log also create issues, as there can be varying activities under the same task category or name. It can therefore be difficult to distinguish tasks of the same name from each other, despite their having different functions. Other problems include adequate data on decision-making, the incorporation of time into the model, different perspectives, incorrectly recorded data, and simply insufficient information. Process mining must be tempered with experience and good judgment to overcome such issues when applying this technique.