mcp / DATASET_CARD.md
Tracy André
Add comprehensive model cards and metadata
7e21e51
|
raw
history blame
9.59 kB
metadata
license: cc-by-4.0
task_categories:
  - tabular-regression
  - time-series-forecasting
language:
  - fr
tags:
  - agriculture
  - herbicides
  - weed-pressure
  - crop-rotation
  - france
  - bretagne
  - sustainability
  - precision-agriculture
  - ift
  - treatment-frequency-index
size_categories:
  - 1K<n<10K
pretty_name: Station Expérimentale de Kerguéhennec - Agricultural Interventions
configs:
  - config_name: default
    data_files:
      - split: train
        path: '*.csv'

🚜 Station Expérimentale de Kerguéhennec - Agricultural Interventions Dataset

Dataset Description

This dataset contains comprehensive agricultural intervention records from the Station Expérimentale de Kerguéhennec in Brittany, France, spanning from 2014 to 2024. The data provides detailed insights into agricultural practices, crop rotations, herbicide treatments, and field management operations across 100 different plots.

Dataset Summary

  • Source: Station Expérimentale de Kerguéhennec, Brittany, France
  • Time Period: 2014-2024 (10 years)
  • Location: Brittany (Bretagne), France
  • Records: 4,663 intervention records
  • Plots: 100 unique agricultural parcels
  • Crops: 42 different crop types
  • Format: CSV exports from farm management system
  • Language: French (field names and crop types)

Primary Use Cases

This dataset is particularly valuable for:

  1. 🌿 Weed Pressure Analysis: Calculate and predict Treatment Frequency Index (IFT) for herbicides
  2. 🔄 Crop Rotation Optimization: Analyze the impact of different crop sequences on pest pressure
  3. 🌱 Sustainable Agriculture: Support reduction of herbicide use while maintaining productivity
  4. 🎯 Precision Agriculture: Identify suitable plots for sensitive crops (peas, beans)
  5. 📊 Agricultural Research: Study relationships between farming practices and outcomes
  6. 🤖 Machine Learning: Train models for agricultural prediction and decision support

Data Structure

Core Fields

Field Description Type Example
millesime Year of intervention Integer 2024
nomparc Plot/field name String "Etang Milieu"
surfparc Plot surface area (hectares) Float 2.28
libelleusag Crop type/usage String "pois de conserve"
datedebut Intervention start date Date "20/2/24"
datefin Intervention end date Date "20/2/24"
libevenem Intervention type String "Semis classique"
familleprod Product family String "Herbicides"
produit Specific product used String "CALLISTO"
quantitetot Total quantity applied Float 1.5
unite Unit of measurement String "L"

Derived Fields (Added During Processing)

Field Description Type
year Standardized year Integer
crop_type Standardized crop classification String
is_herbicide Boolean flag for herbicide treatments Boolean
is_fungicide Boolean flag for fungicide treatments Boolean
is_insecticide Boolean flag for insecticide treatments Boolean
plot_name Standardized plot name String
intervention_type Standardized intervention classification String

Key Statistics

Temporal Coverage

  • Years: 2014-2024 (missing 2017 due to data format issues)
  • Seasons: All agricultural seasons represented
  • Frequency: Multiple interventions per plot per year

Spatial Coverage

  • Plots: 100 unique agricultural parcels
  • Surface: Variable plot sizes (0.43 to 5+ hectares)
  • Location: Single experimental station (controlled conditions)

Intervention Types

  • Herbicide applications: 800+ treatments
  • Total interventions: 4,663 records
  • Product families: Herbicides, Fungicides, Insecticides, Fertilizers
  • Most common crops: Wheat, Corn, Rapeseed

Treatment Frequency Index (IFT)

Definition

The IFT (Indice de Fréquence de Traitement) is a key metric calculated as:

IFT = Number of applications / Plot surface area

Interpretation

  • IFT < 1.0: Low weed pressure (suitable for sensitive crops)
  • IFT 1.0-2.0: Moderate pressure (monitoring required)
  • IFT > 2.0: High pressure (intervention needed)

Dataset Statistics

  • Mean IFT: 1.93 (moderate pressure)
  • Range: 0.14 - 6.67
  • Trend: Decreasing from 2.91 (2014) to 1.74 (2024)

Data Quality

Completeness

  • Core fields: 95%+ completeness for essential variables
  • Date fields: Well-formatted and consistent
  • Numeric fields: Validated ranges and units
  • Geographic data: Anonymized but consistent plot identifiers

Validation

  • Cross-references: Product codes validated against official databases
  • Temporal consistency: Logical intervention sequences
  • Agronomic validity: Realistic crop rotations and treatment patterns

Limitations

  • Geographic scope: Single experimental station (limited geographic diversity)
  • Weather data: Not included (external source required)
  • Economic data: Treatment costs not provided
  • Soil characteristics: Limited soil type information

Ethical Considerations

Privacy Protection

  • Location data: Generalized to protect farm location
  • Personal information: All farmer identifying data removed
  • Commercial sensitivity: Product usage patterns aggregated when appropriate

Bias Considerations

  • Geographic bias: Limited to Brittany region
  • Temporal bias: Recent years may have different practices
  • Selection bias: Experimental station may not represent typical farms
  • Technology bias: Practices may reflect research station capabilities

Applications

1. Weed Pressure Prediction

Use machine learning models to predict future IFT values based on:

  • Historical treatment patterns
  • Crop rotation sequences
  • Environmental factors
  • Plot characteristics

Example Model Performance:

  • Random Forest Regressor: R² = 0.65-0.85
  • Features: Year, plot surface, previous IFT, crop type, rotation sequence

2. Sustainable Plot Selection

Identify plots suitable for sensitive crops (peas, beans) by:

  • Analyzing historical IFT trends
  • Evaluating rotation impacts
  • Assessing risk levels for future years

3. Crop Rotation Optimization

Optimize rotation sequences through:

  • Impact analysis of different crop sequences
  • Identification of beneficial rotations
  • Risk assessment for specific transitions

Best Rotations (Lowest IFT):

  1. Peas → Rapeseed: IFT 0.62
  2. Winter Barley → Rapeseed: IFT 0.64
  3. Corn → Spring Barley: IFT 0.69

4. Herbicide Alternative Analysis

Support reduction strategies through:

  • Product usage pattern analysis
  • Temporal trend identification
  • Alternative strategy development

Code Examples

Loading the Dataset

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("HackathonCRA/2024")

# Convert to pandas for analysis
import pandas as pd
df = dataset["train"].to_pandas()

print(f"Loaded {len(df)} intervention records")
print(f"Covering {df['year'].nunique()} years")

Calculate IFT

# Calculate IFT for herbicide applications
herbicides = df[df['familleprod'].str.contains('Herbicides', na=False)]

ift_data = herbicides.groupby(['plot_name', 'year', 'crop_type']).agg({
    'quantitetot': 'sum',
    'produit': 'count',  # Number of applications
    'surfparc': 'first'
}).reset_index()

ift_data['ift'] = ift_data['produit'] / ift_data['surfparc']

Analyze Crop Rotations

# Create rotation sequences
rotations = []
for plot in df['plot_name'].unique():
    plot_data = df[df['plot_name'] == plot].sort_values('year')
    crops = plot_data.groupby('year')['crop_type'].first()
    
    for i in range(len(crops)-1):
        rotation = f"{crops.iloc[i]}{crops.iloc[i+1]}"
        rotations.append({
            'plot': plot,
            'year_from': crops.index[i],
            'year_to': crops.index[i+1],
            'rotation': rotation
        })

rotation_df = pd.DataFrame(rotations)

Related Datasets

  • Weather Data: Consider integrating with Météo-France data for enhanced analysis
  • Soil Data: European Soil Database for soil type information
  • Economic Data: Agricultural input cost databases
  • Regulatory Data: AMM (Marketing Authorization) product databases

Citation

If you use this dataset in your research, please cite:

@dataset{hackathon_cra_2024,
  title={Station Expérimentale de Kerguéhennec Agricultural Interventions Dataset},
  author={Hackathon CRA Team},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/HackathonCRA/2024},
  note={Agricultural intervention data from Brittany, France (2014-2024)}
}

License

This dataset is released under CC-BY-4.0 license, allowing for both commercial and research use with proper attribution.

Updates and Versioning

  • Version 1.0: Initial release with 2014-2024 data
  • Future versions: May include additional years or enhanced metadata
  • Quality improvements: Ongoing validation and cleaning

Contact

For questions about this dataset, collaboration opportunities, or data corrections, please use the Hugging Face dataset discussion feature or contact the research team through the repository.


Keywords: agriculture, herbicides, crop rotation, sustainable farming, France, Brittany, IFT, weed management, precision agriculture, time series, regression, treatment frequency