Trumpf – MachineLearnAthon

Context

Classifying whether an extracted part can be successfully removed in sheet metal processing is extremely important for companies. Accurate classification makes it possible to recognize faulty parts at an early stage and pass them on to post-processing rather than extracting them. In this way, companies can guarantee high product quality. In sheet metal production, the first step is the cutting and removal of parts from sheet metal, whereby the parts to be removed can vary in complexity. The subsequent removal of the parts is traditional and represents a bottleneck that can be improved through classification. Machine learning methods can help to improve the removal process. ML methods are able to analyze and learn from large amounts of data. In this way, a high classification accuracy can be achieved so that as many parts as possible can be successfully extracted.

Task Description:

The aim of this challenge is to develop a machine learning model that predicts whether a particular piece can be extracted with a given probability of success. The dataset provides information about the sheet-metal production data of the company TRUMPF. TRUMPF is a renowned and leading global supplier of technologies and solutions for sheet metal processing. The company, whose history dates back to 1923, specializes in the development of advanced laser cutting, bending, punching, and welding systems that enable precise, efficient, and high-quality production processes. The training data contains 15.293 data points from 84 anonymized features, which describe the part geometry like perimeter and area, the position and number of points, the material properties and the cutting technology. The data set consists of numerical and categorical features. The categorical are postfixed with __nom. The task is to create a prediction for a binary classification to predict the success of an extraction based on a given test data set.
The task includes:

Creating the binary label for the label “id_13013_RGT_erfolgreich_1try_mean”
Pre-processing the data, like for example encoding and scaling
Selecting an appropriate optimization metric
Applying and validating models

To forecast, please divide the training data into proper training data, validation data and test data.

Dataset:

https://www.kaggle.com/competitions/trumpf/data

1. Sample from the data

unnamed (Index)	id_00004_masch__nom	id_00006_framework__nom	…	id_13013_RGT_erfolgreich_1try_mean
5359	6	V3-1	…	1.0
8192	9	V3-1	…	1.0
5372	6	V3-1	…	1.0
4351	10	V3-1	…	1.0
1534	9	V3-1	…	1.0

2. Table of descriptions (variables, definitions, datatype)

Variable	Definition
id_13013_RGT_erfolgreich_1try_mean	This variable states the success probability

Evaluation Method

Goal: Precisely predict whether the piece can be extracted with a success probability 𝑝 ≥ 0.8 Metric: The result of the classification will be evaluated using the recall score. The metric is used, because the successful prediction of extraction failure is much more important than extraction success.

$$ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} $$

Where TP is the number of correctly classified as positive and FN is the number of positives examples that ware incorrectly classified as negative. The recall score will be calculated based on your submission.

Tutorial and related study material

Micro lectures: The Micro Lectures offer a comprehensive insight into various topics in order to convey the most important information in a compact format. Each Micro Lecture focuses on a specific topic:

Clustering
Data Preparation (for tabular data)
Data Visualization
Installing Python
Introduction to ML
Overfitting

License

The data is published under the CC Zero License.