Predict final tender prices across countries using diverse procurement data reflecting real market and regulatory variability.
Context
Public procurement involves formal purchasing processes used by governments and public institutions.
In each procurement procedure, several suppliers submit bids, and one of the most important variables is the final bidding price.
The objective of modern procurement analytics is to understand and predict this price using historical data.
The dataset used in this notebook contains historical public procurement offers collected from multiple countries.
These datasets vary significantly due to:
- differences in procurement legislation and rules,
- differences in macroeconomic conditions and market structures,
- differences in bidder strategies and competitive intensity.
This variability makes the prediction task realistic and complex, and therefore an ideal introduction to applied machine learning.
The target variable in this notebook is Final_Price_EUR,
which represents the final awarded price of each procurement procedure, expressed in EUR.
All prices in the dataset are already converted to a common currency, ensuring that the modelling task is based on standardized and comparable monetary values.
Task description
We formulate the problem as a supervised regression task:
Objective
To develop a machine learning model capable of predicting the lowest final bidding price in a public procurement tender using historical observations.
Scope
The prediction model should be applicable across:
- multiple countries,
- various procurement categories (e.g., CPV codes),
- different economic environments and bidder behaviours.
The model should remain robust and generalizable, not tailored to a single country or a specific procurement sector.
Limitations
- The underlying dataset may be large, heterogeneous, and contain missing or inconsistent values.
- Data formats and structures may differ across country sources.
- Economic and regulatory differences make the prediction problem non-trivial and require careful preprocessing.
- The final model should generalize across contexts rather than overfit to a narrow subset of the data.
- The dataset is historical and may not fully capture future policy or market changes.
- The model is intended for educational and research purposes, not for fully automated decision-making.
Dataset
https://www.kaggle.com/competitions/bid-2-win-tender-outcome-prediction-challenge/data
The dataset includes information on public procurement procedures collected from multiple countries. It contains categorical and numerical features describing each tender, including tender characteristics (country, size, supply type, procedure type, CPV classification, and year), procedure and funding indicators (EU-funded status, framework agreements, GPA applicability, lots, electronic auctions), buyer-related attributes (buyer type, NUTS region, country), as well as estimated and final prices.
Files
- train.csv – contains all input variables as well as the target variable that we want to predict.
- test.csv – contains the same input variables but does not include the target.
- sample.csv – a sample submission file in the correct format
Columns
- Procurement_ID – Internal persistent id of tender generated by DIGIWHIST
- tender_country – Country of the tender.
- tender_size – Whether the tender is below of above the EU publication threshold. It is based either on the original publication (if they explicitly publish this information), or our ex post estimation based on supply type, buyer type, estimated and final contract values.
- tender_supplyType – The type of the purchase. It can have the following values: supplies, services, public works.
- tender_procedureType – Procedure type mapped to DIGIWHIST standard. It is based on the original procedure type published on the source publication that we recategorized to a standard enumeration. The DIGIWHIST categories are the following: Open, Restricted, Restricted with publication, Negotiated without publication, Competitive dialog, Design contest, Minitender, DPS purchase, Outright award, Approaching bidders, Public contest, Negotiated, Innovation Partnership, Concession, Other (national type)
- tender_nationalProcedureType – Procedure type as it is published in the source publication. Therefore, it contains jurisdiction specific procedure types that might not be possible to standardize in tender_procedureType.
- tender_mainCpv – Main product code of the tender. It is based on the Common Procurement Vocabulary (CPV) codes published on the source publication – https://simap.ted.europa.eu/cpv
- tender_addressOfImplementation_nuts – Regional code of the tender implementation. (These are published NUTS codes from the source publication – https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics)
- tender_addressOfImplementation_country – Country where the tender is to be implemented.
- tender_year – Year of the tender.
- tender_fundingProgrammes – Funding programs associated with the tender.
- tender_npwp_reasons – Reasons for negotiated procedure without publication.
- tender_awardDeadline – The latest date until which the tender is to be awarded. Bidders usually need to bound by their bids until this day but it can differ across jurisdictions.
- tender_contractSignatureDate – The date of contract signature if the tender only has one lot or all lots have the same signature date.
- tender_awardDecisionDate – The award decision date.
- tender_bidDeadline – The final deadline until when companies can submit a bid. It is based on the latest call for tender document published.
- tender_estimatedStartDate – Estimated start date of the contract.
- tender_estimatedCompletionDate – Estimated completion date of the contract.
- tender_estimatedDurationInYears – Estimated length of the contract in years.
- tender_estimatedDurationInMonths – Estimated length of the contract in months.
- tender_estimatedDurationInDays – Estimated length of the contract in days.
- tender_isEUFunded – Whether the tender has EU funding.
- tender_isDps – Whether the tender is a dynamic purchasing system. (This is a purchasing mode is similar to framework agreements.)
- tender_isElectronicAuction – Whether the tender uses an electronic auction
- tender_isAwarded – Whether the tender is awarded or not.
- tender_isCentralProcurement – Whether the purchase is a centralized procurement.
- tender_isJointProcurement – Whether the purchase is a joint procurement (when multiple public bodies purshase something jointly, e.g. because of economies of scale)
- tender_isOnBehalfOf – Whether the purchase is made on behalf of another public body.
- tender_isFrameworkAgreement – Whether the tender is a framework agreement.
- tender_isCoveredByGpa – Whether the tender is covered by the WTI framework agreement. See more: * https://www.wto.org/english/tratop_e/gproc_e/gp_gpa_e.htm
- tender_hasLots – Whether the tender has multiple lots.
- tender_estimatedPrice_currency – Estimated price currency of the tender.
- tender_estimatedPrice_EUR – Estimated price of the tender in Euro
- tender_finalPrice_currency – Final price’s currency.
- Final_Price_EUR – Final price of the tender in EUR.
- tender_description – Detailed description of the tender.
- tender_description_length – Length of the tender description (number of characters).
- tender_personalRequirements_length – Length of the personal requirements set out for participation (number of characters).
- tender_economicRequirements_length – Length of the economic requirements set out for participation (number of characters).
- tender_technicalRequirements_length – Length of the technical requirements set out for participation (number of characters).
- tender_documents_count – Number of documents related to the tender (for example, tender plans or specifications).
- tender_awardCriteria_count – Number of award criteria used in evaluating the bids.
- tender_corrections_count – Number of corrections related to the tender.
- tender_onBehalfOf_count – Number of buyers on behalf the tender is managed.
- tender_lots_count – Number of lots in the tender.
- tender_publications_count – Number of publications related to the tender.
- buyer_buyerType -Type of the buyer.
- buyer_mainActivities – Main activity of the buyer.
- buyer_nuts – Regional code of the buyer. “(These are published NUTS codes from the source publication – https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics)”
- buyer_country – Country of the buyer.
Evaluation method
The model’s predictive performance will be evaluated using standard regression metrics, with R-squared (R²) as the primary evaluation measure. R² indicates how well the model explains the variation in Final_Price_EUR. Higher values mean the model captures underlying patterns in procurement pricing more effectively. As a secondary metric, Mean Absolute Error (MAE) measures the average absolute difference between the predicted and actual final prices. It shows the typical error the model makes in euros and is easy to interpret in practice. Root Mean Squared Error (RMSE) may also be considered for additional insight, as it penalizes larger errors more strongly, which is relevant for high-value procurement contracts where occasional large deviations can be impactful. In summary, higher R² indicates better explanatory power, while lower MAE and RMSE reflect greater prediction accuracy across individual tenders.
Submission File
The submission should be in a CSV file format with columns including Procurement ID and Final_Price_EUR. Ensure that the CSV file follows this structure for consistency and ease of evaluation.
Procurement _ID, Final_Price_EUR
953d-760eac333f8d-4b1fb201-8698-4ebd, 393002.0
a20c-8c74ef6f0fca-012edc4e-403d-4dad, 313035.0
a16a-c06bd42dfb25-014837eb-6ffa-4a4c, 203146.0
8c80-ccf45598af41-04c4f67e-3f86-440f, 223754.83
Citation
Miroslava Barkóciová. Bid2Win: Tender Outcome Prediction Challenge. https://kaggle.com/competitions/bid-2-win-tender-outcome-prediction-challenge, 2025. Kaggle.

