Sales Forecasting

Keywords: Time Series Forecasting, Sales Prediction, Manufacturing Industry, Supply Chain Optimization, Demand Planning

Context

Sales forecasts are indispensable for companies, especially in the manufacturing industry. They enable companies to make well-founded decisions. They therefore often serve as the basis for planning processes along the supply chain. Sales forecasts help to plan production capacities, optimize stock levels and avoid supply bottlenecks. To avoid planning errors, the accuracy of sales forecasts is of great importance. However, demand patterns can be very complex. They depend on various factors, such as the time series pattern, e.g. trend, seasonal or sporadic, the company’s own marketing or the current purchasing power of customers.

In addition, nowadays more and more data is available for creating forecasts. Machine learning algorithms are able to generate precise forecasts from large volumes of data.

Task description

The aim of this challenge is to calculate a sales forecast for a company using machine learning methods. The dataset provides information about the sales of a production company located in the Ruhr area in Germany. The training data contain the documentation of sales from the 1st December 2012 until 30.06.2018 of five products. The task is to create monthly sales forecasts for July, August and September of 2018. To forecast, please divide the training data into proper training data, validation data and test data.

To improve your sales prediction, we suggest to previously apply cluster methods to group the time series.

Dataset

https://www.kaggle.com/competitions/sales-forecasting-new/data

Sample from the data

DateX1X2X3X4X5X6X7Sales QuantitySales Unit
01.12.2012P1S1D1O1MD1MG1MC17992.0PC
06.12.2012P2S1D1O2MD3MG2MC279923.0PC
06.12.2012P3S1D1O3MD4MG3MC3287724.0M
02.12.2012P2S1D1O4MD5MG4MC447954.0PC
08.12.2012P4S1D2O5MD3MG5MC547954.0PC

Description

Variable       Type       ValuesDescription
DateDatetimeDec 2012 to June 2018Date/time of sale (shifted for privacy)
X7CategoricalMC1–MC2032Product identifier
X2 – X6CategoricalX1: P1-P5
X2: S1–S6
X3: D1–D13
X4: O1–O53
X5: MD1–MD5
X6: MG1–MG208
Hierarchical anonymized product features:
• X7 represents the exact product designation.
• Each value of X7 maps uniquely to one value of X6.
• Each value of X6 maps uniquely to one value of X5.
• X4 represents subcategories of X2.
SalesQuantityNumerical376.0 to 500,000,000.0Number of units sold (may be transformed)
SalesUnitCategoricalPC (piece)
M (meter)
M² (square meter)
KG (kg)
Unit of sale

2. A sample from your dataset also helps to visualize the challenge

3. A table of the description (variables, definitions, datatype)

4. License agreement

5. Some further comments if necessary

Evaluation method

Goal: Precisely forecast the future product sales for July, August and September of 2018
Metric: The forecasts will be evaluated using the root mean square error (RMSE). The error metric is specially suited for sales forecasting as it punishes extreme deviations from the actual value more strongly than for example absolute error measures. The RMSE will be calculated based on your submission.
$$ \mathrm{RMSE} := \sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(y_i – \hat{y}_i\right)^2} $$

where yᵢ is the actual value and ŷᵢ is the predicted value.

Please additionally report the RMSE on your training data.

Submission Format: Please submit a csv file containing the following variables in this exact order:
Date, X1, Forecast, SalesUnit

Tutorials and related study material

Micro lectures: The Micro Lectures offer a comprehensive insight into various topics in order to convey the most important information in a compact format. Each Micro Lecture focuses on a specific topic:

  • Installing Python
  • Introduction to ML
  • Data Visualization
  • Preparation
  • Clustering
  • Feature Extraction
  • Time Series Forecasting
  • Forecasting Methods
  • Hyperparameter Tuning
  • Overfitting
  • Imbalanced Datasets
  • Random Forest
  • XGBoost
  • LightGBM
  • Evaluation metrics