Fiber content in Wheat and Barley Grains Crops

Keywords: Your, keywords, inserted, here


Context

Fibers are essential components for humans, as part of a healthy diet, and one of the most critical fibers is Arabinoxylan, which can be found in wheat and barley grains, two massive cereal crops for human consumption. According to Korge et. al. (2023), the levels of arabinoxylan are seasonal; winter and spring might influence the final quality of the fiber in both crops (Figure 1).background, dificulties, Some figure abstract would help

Figure 1. The AX content as an average of N treatments in winter wheat and spring barley flours in different years.

However, Korge et. al. (2023) found that not only does temperature correlate with higher levels of arabinoxylan, but also the kind of fertilizer treatment and crop system influence it. This opens up an opportunity to improve the crop quality and benefit consumers’ health.

Goal: To help farmers produce high-quality crops, we require an AI model to predict the level of arabinoxylan using a 4-year dataset related to fertilizer treatment, crop system, and temperature.

Task description

In this challenge, you will be tasked with predicting the arabinoxylan level using data from different sources. The dataset provides information about fertilizer treatment, crop system, and seasonal temperature from 2014 to 2021.

The dataset is based on the following columns: YEAR: Year of the crop. CROPPING_SYSTEM: farming system used for the crop. TREATMENT: fertilizer treatment for the crop. AVR_APRIL_TEMP: weather temperature in April in C degrees. AVR_MAY_TEMP: weather temperature in May in C degrees. AVR_JUNE_TEMP: weather temperature in June in C degrees. AVR_JULY_TEMP: weather temperature in July in C degrees. AVR_AUGUST_TEMP: weather temperature in August in C degrees. TOTAL_PRECIPITATION: total rainfall between April and August in mm. PRECIPITATION_APRIL: rainfall in April in mm. PRECIPITATION_MAY: rainfall in May in mm. PRECIPITATION_JUNE: rainfall in June in mm. PRECIPITATION_JULY: rainfall in July in mm. PRECIPITATION_AUGUST: rainfall in August in mm. BARLEY_YIELD: total amount of barley produced in tons/ha. AX_CONTENT: Arabinoxylan content in g/100g.

The primary goal is to implement a robust machine learning regressor model to accurately predict the quantity of arabinoxylan (AX_CONTENT) out of a diverse dataset of selected variables listed above.


Dataset

https://www.kaggle.com/competitions/fiber-content-in-wheat-and-barley-grains-cropss/data

This page appears alongside the data files. It describes what files have been provided and the format of each. There is no single format for this page that is appropriate for all competitions, but you should strive to describe as much as you can here. A little time spent describing the data here can save a lot of time answering questions later.

Participants should be able to answer these types of questions after reading the data description:

What files do I need?
What should I expect the data format to be?
What am I predicting?
What acronyms will I encounter?

Files

  • train.csv – the training set
  • test.csv – the test set
  • sample_submission.csv – a sample submission file in the correct format

Columns

  • YEAR: Year of the crop.
  • CROPPING_SYSTEM: farming system used for the crop.
  • TREATMENT: fertilizer treatment for the crop.
  • AVR_APRIL_TEMP: weather temperature in April in C degrees.
  • AVR_MAY_TEMP: weather temperature in May in C degrees.
  • AVR_JUNE_TEMP: weather temperature in June in C degrees.
  • AVR_JULY_TEMP: weather temperature in July in C degrees.
  • AVR_AUGUST_TEMP: weather temperature in August in C degrees.
  • TOTAL_PRECIPITATION: total rainfall between April and August in mm.
  • PRECIPITATION_APRIL: rainfall in April in mm.
  • PRECIPITATION_MAY: rainfall in May in mm.
  • PRECIPITATION_JUNE: rainfall in June in mm.
  • PRECIPITATION_JULY: rainfall in July in mm.
  • PRECIPITATION_AUGUST: rainfall in August in mm.
  • BARLEY_YIELD: total amount of barley produced in tons/ha.
  • AX_CONTENT: Arabinoxylan content in g/100g.

Evaluation

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target.

Submission File

For each ID in the test set, you must predict a probability for the TARGET variable. The file should contain a header and have the following format:

ID,TARGET
2,0
5,0
6,0
etc.

Citation

Erick Fiestas S.. Fiber content in Wheat and Barley Grains Crops. https://kaggle.com/competitions/fiber-content-in-wheat-and-barley-grains-cropss, 2025. Kaggle.