Sales Forecasting

Keywords: Your, keywords, inserted, here


Context

Sales forecasts are indispensable for companies, especially in the manufacturing industry. They enable companies to make well-founded decisions. They therefore often serve as the basis for planning processes along the supply chain. Sales forecasts help to plan production capacities, optimize stock levels and avoid supply bottlenecks. To avoid planning errors, the accuracy of sales forecasts is of great importance. However, demand patterns can be very complex. They depend on various factors, such as the time series pattern, e.g. trend, seasonal or sporadic, the company’s own marketing or the current purchasing power of customers.

In addition, nowadays more and more data is available for creating forecasts. Machine learning algorithms are able to generate precise forecasts from large volumes of data.

Task description

The task at hand is concerned with…
The aim of this challenge is to calculate a sales forecast for a company using machine learning methods. The dataset provides information about the sales of a production company located in the Ruhr area in Germany. The training data contain the documentation of sales from the 1st December 2012 until 30.06.2018 of five products. The task is to create monthly sales forecasts for July, August and September of 2018. To forecast, please divide the training data into proper training data, validation data and test data.

Dataset

https://www.kaggle.com/competitions/sales-forecasting-competition/data

Sample from the data

DateX1X2X3X4X5X6X7Sales QuantitySales Unit
01.12.2012P1S1D1O1MD1MG1MC17992.0PC
06.12.2012P2S1D1O2MD3MG2MC279923.0PC
06.12.2012P3S1D1O3MD4MG3MC3287724.0M
02.12.2012P2S1D1O4MD5MG4MC447954.0PC
08.12.2012P4S1D2O5MD3MG5MC547954.0PC

Desciption

Variable       Type       ValuesDescription
DateDatetimeDec 2012 to Oct 2018Date/time of sale (shifted for privacy)
X1CategoricalP1–P5Product identifier
X2 – X7CategoricalX2: S1–S6
X3: D1–D13
X4: O1–O53
X5: MD1–MD5
X6: MG1–MG208
X7: MC1–MC2032
Anonymized categorical features
SalesQuantityNumerical376.0 to 500,000,000.0Number of units sold (may be transformed)
SalesUnitCategoricalPC (piece)
M (meter)
M² (square meter)
KG (kg)
Unit of sale

2. A sample from your dataset also helps to visualize the challenge

3. A table of the description (variables, definitions, datatype)

4. License agreement

5. Some further comments if necessary

Evaluation method

Goal: Precisely forecast the future product sales for July, August and September of 2018
Metric: The forecasts will be evaluated using the root mean square error (RMSE). The error metric is specially suited for sales forecasting as it punishes extreme deviations from the actual value more strongly than for example absolute error measures. The RMSE will be calculated based on your submission.
$$ \mathrm{RMSE} := \sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(y_i – \hat{y}_i\right)^2} $$ where \(y_i\) is the actual value and \(\hat{y}_i\) is the predicted value.

Please additionally report the RMSE on your training data.

Submission Format: Please submit a csv file containing the following variables in this exact order:
Date, X1, Forecast, SalesUnit


Tutorials and related study material

Micro lectures: The Micro Lectures offer a comprehensive insight into various topics in order to convey the most important information in a compact format. Each Micro Lecture focuses on a specific topic:

  • Clustering
  • Data Preparation (for tabular data)
  • Data Visualization
  • Installing Python
  • Introduction to ML
  • Overfitting