Keywords: Chemical engineering, computer vision, classification, process flow diagrams
Context
Process flow diagrams (PFDs) are very important documents in the chemical process industry. In particular, there exist a variety of different PFDs used during all engineering stages from early-stage process development to detailed engineering, construction, operation, and disassembly. These PFDs represent essential information about chemical processes [1], such as process topology, major unit operations, control equipment, and piping information[2], [3]. See below for an example:
PFDs contain three main sources of information: (i) unit operations, (ii) connectivities, and (ii) additional text.
Despite the availability of advanced CAD software, PFDs are still sometimes drawn by hand. This is especially true for early development stages or when existing PFDs need to be adapted. Thus, the information of the PFD can often not be directly read out from computer programs. Consequently, chemical companies have large amounts of legacy PFDs that are not machine-readable. In order to digitize PFDs, we need to recognize all hand-drawn symbols in PFDs. For this, we use machine learning. In this challenge, we will concentrate on digitizing hand-drawn unit operations. See below for some examples. We will classify them according to their class.
Task description
In this challenge, you will be tasked with the classification of hand-drawn unit operation symbols, see Figure 2. These symbols come in various categories, each representing a specific unit operation used in chemical engineering, such as distillation, filtration, mixing, and reactions. Different unit operations sometimes are depicted in different ways.
The primary goal is to develop robust machine learning models capable of accurately classifying these symbols into their respective categories. Participants will have access to a diverse dataset of hand-drawn unit operation symbols for training and validation, and their models will be evaluated based on their ability to correctly classify symbols in a test dataset
By successfully addressing this challenge, participants will contribute to the digitization of PFDs in the chemical process industry, making essential process information more accessible and efficient for engineers and operators. Join us in this endeavor to bridge the gap between legacy hand-drawn diagrams and modern machine-readable representations, revolutionizing the way we understand and utilize chemical processes.
Dataset
Download link: some link here
Description: We split the whole dataset into training, validation, and test datasets which contain 2680, 575, and 576 images, respectively. Besides, within each subset, a spreadsheet describes the metadata. The metadata is origized as:
Name | ID | labelID | labelWord |
Datatype | Int | Int | String |
Description | The ID of the images for retrieving | The ID of the image label | Detained descriptions of image label (type of unit operation) |
License
Evaluation methods
Goal: Predicting type of unit operation
Loss function: Cross-entropy loss
Submission format: The file should be in CSV format and contain two columns: “ID” and “Prediction”.
Tutorials
Python tutorial: Python’s versatility, extensive library support, readability, and active community make it a foundational language for machine learning and contribute to its widespread adoption in the field. Its role in machine learning is expected to continue growing as the field evolves and new tools and techniques emerge
PyTorch tutorial: PyTorch is an open-source deep learning framework. It is designed to provide a flexible and dynamic platform for building and training artificial neural networks.
References
[1] G. Nasby, “Using process flowsheets as communication tools,” Chem Eng Prog, vol. 108, no. 10, pp. 36–44, Oct. 2012.
[2] L. S. Balhorn, Q. Gao, D. Goldstein, and A. M. Schweidtmann, “Flowsheet Recognition using Deep Convolutional Neural Networks,” Computer Aided Chemical Engineering, vol. 49, pp. 1567–1572, Jan. 2022, doi: 10.1016/B978-0-323-85159-6.50261-X.
[3] M. F. Theisen, K. N. Flores, L. Schulze Balhorn, and A. M. Schweidtmann, “Digitization of chemical process flow diagrams using deep convolutional neural networks,” Digital Chemical Engineering, vol. 6, p. 100072, Mar. 2023, doi: 10.1016/j.dche.2022.100072.