loodvanniekerkginkgo commited on
Commit
d8b25f9
Β·
1 Parent(s): 9a87acd

Added more explainers

Browse files
Files changed (1) hide show
  1. about.py +16 -11
about.py CHANGED
@@ -42,7 +42,9 @@ Here we invite the community to submit and develop better predictors, which will
42
  #### πŸ† Prizes
43
 
44
  For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
45
- There is also an 'open-source' prize for the best model trained on the GDPa1 dataset of monoclonal antibodies (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
 
 
46
  For each of these 6 prizes, participants have the choice between
47
  - **$10 000 in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/), or
48
  - A **$2000 cash prize**.
@@ -124,7 +126,7 @@ FAQS = {
124
  "No, there are no requirements to submit code / methods and submitted predictions remain private. "
125
  "We also have an optional field for including a short model description. "
126
  "Top performing participants will be requested to identify themselves at the end of the tournament. "
127
- "There will be one prize for the best open-source model, which will require code / methods to be available."
128
  ),
129
  "How exactly can I evaluate my model?": (
130
  "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
@@ -172,25 +174,28 @@ SUBMIT_INSTRUCTIONS = f"""
172
  You do **not** need to predict all 5 properties β€” each property has its own leaderboard and prize.
173
 
174
  ## Instructions
175
- 1. **Upload both CSV files**:
176
- - **GDPa1 Cross-Validation predictions** (using cross-validation folds)
177
- - **Private Test Set predictions** (final test submission)
178
  2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
179
  - List of valid property names: `{', '.join(ASSAY_LIST)}`.
180
- 3. Submit as many times as you like, and the latest submission will be used for the leaderboard (and test set scoring at the end of the competition).
 
181
 
182
  The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually using average Spearman rank correlation across the 5 folds.
183
 
184
  ## Cross-validation
185
 
186
- For the GDPa1 cross-validation predictions, use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
187
- Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
188
- Check out our tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
 
 
 
 
189
 
190
  ## Test set
191
 
192
- The **private test set results will not appear on the leaderboards at first**, and will be used to determine the winners at the close of the competition.
193
- πŸ—“οΈ There will be a test set scoring on **October 13th** (which will score all the latest test set submissions at that point).
194
 
195
  Submissions close on **1 November 2025**.
196
  """
 
42
  #### πŸ† Prizes
43
 
44
  For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
45
+ There is also an 'open-source' prize for the best reproducible model: one that is trained on the GDPa1 dataset (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
46
+ This will be judged by a panel (i.e. by default the model with the highest average Spearman correlation across all properties will be selected, but a really good model on just one property may be better for the community).
47
+
48
  For each of these 6 prizes, participants have the choice between
49
  - **$10 000 in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/), or
50
  - A **$2000 cash prize**.
 
126
  "No, there are no requirements to submit code / methods and submitted predictions remain private. "
127
  "We also have an optional field for including a short model description. "
128
  "Top performing participants will be requested to identify themselves at the end of the tournament. "
129
+ "There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
130
  ),
131
  "How exactly can I evaluate my model?": (
132
  "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
 
174
  You do **not** need to predict all 5 properties β€” each property has its own leaderboard and prize.
175
 
176
  ## Instructions
177
+ 1. **Upload two CSV files**: one with GDPa1 cross-validation predictions, and one with private test set predictions
 
 
178
  2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
179
  - List of valid property names: `{', '.join(ASSAY_LIST)}`.
180
+ - Include the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column if submitting cross-validation predictions.
181
+ 3. You can resubmit as often as you like; only your latest submission will count for both the leaderboard and final test set scoring.
182
 
183
  The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually using average Spearman rank correlation across the 5 folds.
184
 
185
  ## Cross-validation
186
 
187
+ For the GDPa1 cross-validation predictions:
188
+ 1. Split the dataset using the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column
189
+ 2. Train on 4 folds and predict on the held-out fold
190
+ 3. Collect held-out predictions for all 5 folds into one dataframe
191
+ 4. Write this dataframe to a .csv file and submit as your GDPa1 cross-validation predictions
192
+
193
+ The leaderboard will show the average Spearman rank correlation across the 5 folds. For a code example, check out our tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
194
 
195
  ## Test set
196
 
197
+ The **private test set submissions will not be scored automatically**, to avoid test set hacking. They will be evaluated after submissions close to determine winners.
198
+ πŸ—“οΈ We will release one interim scoring of the latest private test set submissions on **October 13th**. Use this opportunity to see how your model is performing on the heldout test set and refine accordingly.
199
 
200
  Submissions close on **1 November 2025**.
201
  """