loodvanniekerkginkgo commited on
Commit
767c884
·
1 Parent(s): 094a347

Updating documentation (h/t Diya Mohan)

Browse files
about.py CHANGED
@@ -6,6 +6,7 @@ from constants import (
6
  FAQ_TAB_NAME,
7
  SLACK_URL,
8
  TUTORIAL_URL,
 
9
  )
10
 
11
  WEBSITE_HEADER = f"""
@@ -59,7 +60,7 @@ ABOUT_TEXT = f"""
59
 
60
  1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
61
  2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
62
- 3. **Build a model** using cross-validation on the [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) dataset, using the `hierarchical_cluster_IgG_isotype_stratified_fold` column to split the dataset into folds, and write out all cross-validation predictions to a CSV file.
63
  4. **Use your model to make predictions** on the private test set (download the 80 private test set sequences from the {SUBMIT_TAB_NAME} tab).
64
  5. **Submit your training and test set predictions** on the {SUBMIT_TAB_NAME} tab by uploading both your cross-validation and private test set CSV files.
65
 
@@ -69,6 +70,13 @@ Check out our introductory tutorial on training an antibody developability predi
69
 
70
  ---
71
 
 
 
 
 
 
 
 
72
  #### Acknowledgements
73
 
74
  We gratefully acknowledge [Tamarind Bio](https://www.tamarind.bio/)'s help in running the following models which are on the leaderboard:
@@ -84,11 +92,14 @@ We're working on getting more public models added, so that participants have mor
84
 
85
  #### How to contribute?
86
 
87
- We'd like to add more existing developability models to the leaderboard. Some examples of models we'd like to add:
 
 
88
  - Absolute folding stability models (for Thermostability)
89
  - PROPERMAB
90
  - AbMelt (requires GROMACS for MD simulations)
91
 
 
92
  If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
93
  """
94
 
@@ -131,7 +142,7 @@ FAQS = {
131
  ),
132
  "Do I need to submit my code / methods in order to participate?": (
133
  "No, there are no requirements to submit code / methods and submitted predictions remain private. "
134
- "We also have an optional field for including a short model description. "
135
  "Top performing participants will be requested to identify themselves at the end of the tournament. "
136
  "There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
137
  ),
@@ -153,10 +164,13 @@ FAQS = {
153
  "We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
154
  ),
155
  "How does the open-source prize work?": (
156
- "Participants who open-source their training code and methods will be eligible for the open-source prize (as well as the other prizes)."
157
  ),
158
  "Can I use proprietary tools like AlphaFold3 for the open-source prize?": (
159
- "Yes, using tools that have published their inference code under proprietary licenses is allowed (like AlphaFold3 and PROPERMAB), as long as code is available and fully reproducible."
 
 
 
160
  ),
161
  "What do I need to submit?": (
162
  'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
 
6
  FAQ_TAB_NAME,
7
  SLACK_URL,
8
  TUTORIAL_URL,
9
+ GITHUB_URL,
10
  )
11
 
12
  WEBSITE_HEADER = f"""
 
60
 
61
  1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
62
  2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
63
+ 3. **Build a model** using cross-validation on the [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) dataset, using the `hierarchical_cluster_IgG_isotype_stratified_fold` column to split the dataset into folds, and write out all cross-validation predictions to a CSV file. You may also use outside datasets, but still need to report these cross-validation predictions.
64
  4. **Use your model to make predictions** on the private test set (download the 80 private test set sequences from the {SUBMIT_TAB_NAME} tab).
65
  5. **Submit your training and test set predictions** on the {SUBMIT_TAB_NAME} tab by uploading both your cross-validation and private test set CSV files.
66
 
 
70
 
71
  ---
72
 
73
+ #### Data and models
74
+
75
+ You may use any data and models you like for the main competition, since all code/methods can be kept private and you just submit predictions.
76
+ For the open-source prize, you must train on the GDPa1 dataset using cross-validation and must use all public models/data.
77
+
78
+ ---
79
+
80
  #### Acknowledgements
81
 
82
  We gratefully acknowledge [Tamarind Bio](https://www.tamarind.bio/)'s help in running the following models which are on the leaderboard:
 
92
 
93
  #### How to contribute?
94
 
95
+ Check out the GitHub repository ({GITHUB_URL}) for a bunch of runnable models and Jupyter notebooks to get started, or to contribute your own models.
96
+
97
+ We'd like to add more existing developability models to the leaderboard. Some examples of models we'd like to onboard (also tracked in the GitHub repository):
98
  - Absolute folding stability models (for Thermostability)
99
  - PROPERMAB
100
  - AbMelt (requires GROMACS for MD simulations)
101
 
102
+
103
  If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
104
  """
105
 
 
142
  ),
143
  "Do I need to submit my code / methods in order to participate?": (
144
  "No, there are no requirements to submit code / methods and submitted predictions remain private. "
145
+ "We have an optional field for including a short model description in the submission tab. "
146
  "Top performing participants will be requested to identify themselves at the end of the tournament. "
147
  "There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
148
  ),
 
164
  "We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
165
  ),
166
  "How does the open-source prize work?": (
167
+ "Participants who train on GDPa1 and open-source their training code and methods and have reproducible results will be eligible for the open-source prize (as well as the other prizes)."
168
  ),
169
  "Can I use proprietary tools like AlphaFold3 for the open-source prize?": (
170
+ "Yes, using tools that have published their inference code under proprietary licenses is allowed (like AlphaFold3 and PROPERMAB), as long as code is available and fully reproducible. Although fully open models (open to commercial use) are highly preferred though. For other prizes, you can use any private models/data you like."
171
+ ),
172
+ "Can I train on other public/private datasets?": (
173
+ "Yes, you can use any private models/data you like for the 5 main assay prizes, since all code/methods can be kept private and you just submit predictions. For the open-source prize, you must train on the GDPa1 dataset using cross-validation and must use all public models/data. Models with proprietary licenses but open code are allowed, but fully open models are highly preferred."
174
  ),
175
  "What do I need to submit?": (
176
  'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
app.py CHANGED
@@ -120,7 +120,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo:
120
  with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
121
  gr.Markdown(ABOUT_INTRO)
122
  gr.Image(
123
- value="./assets/prediction_explainer_cv.png",
124
  show_label=False,
125
  show_download_button=False,
126
  show_share_button=False,
 
120
  with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
121
  gr.Markdown(ABOUT_INTRO)
122
  gr.Image(
123
+ value="./assets/prediction_explainer_v3.png",
124
  show_label=False,
125
  show_download_button=False,
126
  show_share_button=False,
assets/prediction_explainer_cv.png DELETED

Git LFS Details

  • SHA256: 1028b5a4034bbeb403b6a015f831dd5715baaca4698ced2b4fff85da00116297
  • Pointer size: 130 Bytes
  • Size of remote file: 79.6 kB
assets/{prediction_explainer.png → prediction_explainer_v3.png} RENAMED
File without changes
constants.py CHANGED
@@ -44,6 +44,7 @@ REGISTRATION_CODE = os.environ.get("REGISTRATION_CODE")
44
  TERMS_URL = "https://euphsfcyogalqiqsawbo.supabase.co/storage/v1/object/public/gdpweb/pdfs/2025%20Ginkgo%20Antibody%20Developability%20Prediction%20Competition%202025-08-28-v2.pdf"
45
  SLACK_URL = "https://join.slack.com/t/bitsinbio/shared_invite/zt-3dqigle2b-e0dEkfPPzzWL055j_8N_eQ"
46
  TUTORIAL_URL = "https://huggingface.co/blog/ginkgo-datapoints/making-antibody-embeddings-and-predictions"
 
47
 
48
  # Input CSV file requirements
49
  REQUIRED_COLUMNS: list[str] = [
 
44
  TERMS_URL = "https://euphsfcyogalqiqsawbo.supabase.co/storage/v1/object/public/gdpweb/pdfs/2025%20Ginkgo%20Antibody%20Developability%20Prediction%20Competition%202025-08-28-v2.pdf"
45
  SLACK_URL = "https://join.slack.com/t/bitsinbio/shared_invite/zt-3dqigle2b-e0dEkfPPzzWL055j_8N_eQ"
46
  TUTORIAL_URL = "https://huggingface.co/blog/ginkgo-datapoints/making-antibody-embeddings-and-predictions"
47
+ GITHUB_URL = "https://github.com/ginkgobioworks/abdev-benchmark"
48
 
49
  # Input CSV file requirements
50
  REQUIRED_COLUMNS: list[str] = [