Keywords: Your, keywords, inserted, here
Context
Sales forecasts are indispensable for companies, especially in the manufacturing industry. They enable companies to make well-founded decisions. They therefore often serve as the basis for planning processes along the supply chain. Sales forecasts help to plan production capacities, optimize stock levels and avoid supply bottlenecks. To avoid planning errors, the accuracy of sales forecasts is of great importance. However, demand patterns can be very complex. They depend on various factors, such as the time series pattern, e.g. trend, seasonal or sporadic, the company’s own marketing or the current purchasing power of customers.

In addition, nowadays more and more data is available for creating forecasts. Machine learning algorithms are able to generate precise forecasts from large volumes of data.
Task description
The task at hand is concerned with…
The aim of this challenge is to calculate a sales forecast for a company using machine learning methods. The dataset provides information about the sales of a production company located in the Ruhr area in Germany. The training data contain the documentation of sales from the 1st December 2012 until 30.06.2018 of five products. The task is to create monthly sales forecasts for July, August and September of 2018. To forecast, please divide the training data into proper training data, validation data and test data.
Dataset
https://www.kaggle.com/competitions/sales-forecasting-competition/data
Sample from the data
| Date | X1 | X2 | X3 | X4 | X5 | X6 | X7 | Sales Quantity | Sales Unit |
|---|---|---|---|---|---|---|---|---|---|
| 01.12.2012 | P1 | S1 | D1 | O1 | MD1 | MG1 | MC1 | 7992.0 | PC |
| 06.12.2012 | P2 | S1 | D1 | O2 | MD3 | MG2 | MC2 | 79923.0 | PC |
| 06.12.2012 | P3 | S1 | D1 | O3 | MD4 | MG3 | MC3 | 287724.0 | M |
| 02.12.2012 | P2 | S1 | D1 | O4 | MD5 | MG4 | MC4 | 47954.0 | PC |
| 08.12.2012 | P4 | S1 | D2 | O5 | MD3 | MG5 | MC5 | 47954.0 | PC |
Desciption
| Variable | Type | Values | Description |
|---|---|---|---|
Date | Datetime | Dec 2012 to Oct 2018 | Date/time of sale (shifted for privacy) |
X1 | Categorical | P1–P5 | Product identifier |
X2 – X7 | Categorical | X2: S1–S6 X3: D1–D13 X4: O1–O53 X5: MD1–MD5 X6: MG1–MG208 X7: MC1–MC2032 | Anonymized categorical features |
SalesQuantity | Numerical | 376.0 to 500,000,000.0 | Number of units sold (may be transformed) |
SalesUnit | Categorical | PC (piece) M (meter) M² (square meter) KG (kg) | Unit of sale |
2. A sample from your dataset also helps to visualize the challenge
3. A table of the description (variables, definitions, datatype)
4. License agreement
5. Some further comments if necessary
Evaluation method
$$ \mathrm{RMSE} := \sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(y_i – \hat{y}_i\right)^2} $$ where \(y_i\) is the actual value and \(\hat{y}_i\) is the predicted value.Please additionally report the RMSE on your training data.
Submission Format: Please submit a csv file containing the following variables in this exact order:
Date, X1, Forecast, SalesUnit
Tutorials and related study material
Micro lectures: The Micro Lectures offer a comprehensive insight into various topics in order to convey the most important information in a compact format. Each Micro Lecture focuses on a specific topic:
- Clustering
- Data Preparation (for tabular data)
- Data Visualization
- Installing Python
- Introduction to ML
- Overfitting

