Commit
·
767c884
1
Parent(s):
094a347
Updating documentation (h/t Diya Mohan)
Browse files- about.py +19 -5
- app.py +1 -1
- assets/prediction_explainer_cv.png +0 -3
- assets/{prediction_explainer.png → prediction_explainer_v3.png} +2 -2
- constants.py +1 -0
about.py
CHANGED
|
@@ -6,6 +6,7 @@ from constants import (
|
|
| 6 |
FAQ_TAB_NAME,
|
| 7 |
SLACK_URL,
|
| 8 |
TUTORIAL_URL,
|
|
|
|
| 9 |
)
|
| 10 |
|
| 11 |
WEBSITE_HEADER = f"""
|
|
@@ -59,7 +60,7 @@ ABOUT_TEXT = f"""
|
|
| 59 |
|
| 60 |
1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
|
| 61 |
2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
|
| 62 |
-
3. **Build a model** using cross-validation on the [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) dataset, using the `hierarchical_cluster_IgG_isotype_stratified_fold` column to split the dataset into folds, and write out all cross-validation predictions to a CSV file.
|
| 63 |
4. **Use your model to make predictions** on the private test set (download the 80 private test set sequences from the {SUBMIT_TAB_NAME} tab).
|
| 64 |
5. **Submit your training and test set predictions** on the {SUBMIT_TAB_NAME} tab by uploading both your cross-validation and private test set CSV files.
|
| 65 |
|
|
@@ -69,6 +70,13 @@ Check out our introductory tutorial on training an antibody developability predi
|
|
| 69 |
|
| 70 |
---
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
#### Acknowledgements
|
| 73 |
|
| 74 |
We gratefully acknowledge [Tamarind Bio](https://www.tamarind.bio/)'s help in running the following models which are on the leaderboard:
|
|
@@ -84,11 +92,14 @@ We're working on getting more public models added, so that participants have mor
|
|
| 84 |
|
| 85 |
#### How to contribute?
|
| 86 |
|
| 87 |
-
|
|
|
|
|
|
|
| 88 |
- Absolute folding stability models (for Thermostability)
|
| 89 |
- PROPERMAB
|
| 90 |
- AbMelt (requires GROMACS for MD simulations)
|
| 91 |
|
|
|
|
| 92 |
If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
|
| 93 |
"""
|
| 94 |
|
|
@@ -131,7 +142,7 @@ FAQS = {
|
|
| 131 |
),
|
| 132 |
"Do I need to submit my code / methods in order to participate?": (
|
| 133 |
"No, there are no requirements to submit code / methods and submitted predictions remain private. "
|
| 134 |
-
"We
|
| 135 |
"Top performing participants will be requested to identify themselves at the end of the tournament. "
|
| 136 |
"There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
|
| 137 |
),
|
|
@@ -153,10 +164,13 @@ FAQS = {
|
|
| 153 |
"We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
|
| 154 |
),
|
| 155 |
"How does the open-source prize work?": (
|
| 156 |
-
"Participants who open-source their training code and methods will be eligible for the open-source prize (as well as the other prizes)."
|
| 157 |
),
|
| 158 |
"Can I use proprietary tools like AlphaFold3 for the open-source prize?": (
|
| 159 |
-
"Yes, using tools that have published their inference code under proprietary licenses is allowed (like AlphaFold3 and PROPERMAB), as long as code is available and fully reproducible."
|
|
|
|
|
|
|
|
|
|
| 160 |
),
|
| 161 |
"What do I need to submit?": (
|
| 162 |
'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
|
|
|
|
| 6 |
FAQ_TAB_NAME,
|
| 7 |
SLACK_URL,
|
| 8 |
TUTORIAL_URL,
|
| 9 |
+
GITHUB_URL,
|
| 10 |
)
|
| 11 |
|
| 12 |
WEBSITE_HEADER = f"""
|
|
|
|
| 60 |
|
| 61 |
1. **Create a Hugging Face account** [here](https://huggingface.co/join) if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
|
| 62 |
2. **Register your team** on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
|
| 63 |
+
3. **Build a model** using cross-validation on the [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) dataset, using the `hierarchical_cluster_IgG_isotype_stratified_fold` column to split the dataset into folds, and write out all cross-validation predictions to a CSV file. You may also use outside datasets, but still need to report these cross-validation predictions.
|
| 64 |
4. **Use your model to make predictions** on the private test set (download the 80 private test set sequences from the {SUBMIT_TAB_NAME} tab).
|
| 65 |
5. **Submit your training and test set predictions** on the {SUBMIT_TAB_NAME} tab by uploading both your cross-validation and private test set CSV files.
|
| 66 |
|
|
|
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
+
#### Data and models
|
| 74 |
+
|
| 75 |
+
You may use any data and models you like for the main competition, since all code/methods can be kept private and you just submit predictions.
|
| 76 |
+
For the open-source prize, you must train on the GDPa1 dataset using cross-validation and must use all public models/data.
|
| 77 |
+
|
| 78 |
+
---
|
| 79 |
+
|
| 80 |
#### Acknowledgements
|
| 81 |
|
| 82 |
We gratefully acknowledge [Tamarind Bio](https://www.tamarind.bio/)'s help in running the following models which are on the leaderboard:
|
|
|
|
| 92 |
|
| 93 |
#### How to contribute?
|
| 94 |
|
| 95 |
+
Check out the GitHub repository ({GITHUB_URL}) for a bunch of runnable models and Jupyter notebooks to get started, or to contribute your own models.
|
| 96 |
+
|
| 97 |
+
We'd like to add more existing developability models to the leaderboard. Some examples of models we'd like to onboard (also tracked in the GitHub repository):
|
| 98 |
- Absolute folding stability models (for Thermostability)
|
| 99 |
- PROPERMAB
|
| 100 |
- AbMelt (requires GROMACS for MD simulations)
|
| 101 |
|
| 102 |
+
|
| 103 |
If you would like to form a team or discuss ideas, join the [Slack community]({SLACK_URL}) co-hosted by Bits in Bio.
|
| 104 |
"""
|
| 105 |
|
|
|
|
| 142 |
),
|
| 143 |
"Do I need to submit my code / methods in order to participate?": (
|
| 144 |
"No, there are no requirements to submit code / methods and submitted predictions remain private. "
|
| 145 |
+
"We have an optional field for including a short model description in the submission tab. "
|
| 146 |
"Top performing participants will be requested to identify themselves at the end of the tournament. "
|
| 147 |
"There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
|
| 148 |
),
|
|
|
|
| 164 |
"We reserve the right to award the open-source prize to a predictor with competitive results for a subset of properties (e.g. a top polyreactivity model)."
|
| 165 |
),
|
| 166 |
"How does the open-source prize work?": (
|
| 167 |
+
"Participants who train on GDPa1 and open-source their training code and methods and have reproducible results will be eligible for the open-source prize (as well as the other prizes)."
|
| 168 |
),
|
| 169 |
"Can I use proprietary tools like AlphaFold3 for the open-source prize?": (
|
| 170 |
+
"Yes, using tools that have published their inference code under proprietary licenses is allowed (like AlphaFold3 and PROPERMAB), as long as code is available and fully reproducible. Although fully open models (open to commercial use) are highly preferred though. For other prizes, you can use any private models/data you like."
|
| 171 |
+
),
|
| 172 |
+
"Can I train on other public/private datasets?": (
|
| 173 |
+
"Yes, you can use any private models/data you like for the 5 main assay prizes, since all code/methods can be kept private and you just submit predictions. For the open-source prize, you must train on the GDPa1 dataset using cross-validation and must use all public models/data. Models with proprietary licenses but open code are allowed, but fully open models are highly preferred."
|
| 174 |
),
|
| 175 |
"What do I need to submit?": (
|
| 176 |
'There is a tab on the Hugging Face competition page to upload predictions for datasets - for each dataset participants need to submit a CSV containing a column for each property they would like to predict (e.g. called "HIC"), '
|
app.py
CHANGED
|
@@ -120,7 +120,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=sizes.text_lg)) as demo:
|
|
| 120 |
with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
|
| 121 |
gr.Markdown(ABOUT_INTRO)
|
| 122 |
gr.Image(
|
| 123 |
-
value="./assets/
|
| 124 |
show_label=False,
|
| 125 |
show_download_button=False,
|
| 126 |
show_share_button=False,
|
|
|
|
| 120 |
with gr.TabItem(ABOUT_TAB_NAME, elem_id="abdev-benchmark-tab-table"):
|
| 121 |
gr.Markdown(ABOUT_INTRO)
|
| 122 |
gr.Image(
|
| 123 |
+
value="./assets/prediction_explainer_v3.png",
|
| 124 |
show_label=False,
|
| 125 |
show_download_button=False,
|
| 126 |
show_share_button=False,
|
assets/prediction_explainer_cv.png
DELETED
Git LFS Details
|
assets/{prediction_explainer.png → prediction_explainer_v3.png}
RENAMED
|
File without changes
|
constants.py
CHANGED
|
@@ -44,6 +44,7 @@ REGISTRATION_CODE = os.environ.get("REGISTRATION_CODE")
|
|
| 44 |
TERMS_URL = "https://euphsfcyogalqiqsawbo.supabase.co/storage/v1/object/public/gdpweb/pdfs/2025%20Ginkgo%20Antibody%20Developability%20Prediction%20Competition%202025-08-28-v2.pdf"
|
| 45 |
SLACK_URL = "https://join.slack.com/t/bitsinbio/shared_invite/zt-3dqigle2b-e0dEkfPPzzWL055j_8N_eQ"
|
| 46 |
TUTORIAL_URL = "https://huggingface.co/blog/ginkgo-datapoints/making-antibody-embeddings-and-predictions"
|
|
|
|
| 47 |
|
| 48 |
# Input CSV file requirements
|
| 49 |
REQUIRED_COLUMNS: list[str] = [
|
|
|
|
| 44 |
TERMS_URL = "https://euphsfcyogalqiqsawbo.supabase.co/storage/v1/object/public/gdpweb/pdfs/2025%20Ginkgo%20Antibody%20Developability%20Prediction%20Competition%202025-08-28-v2.pdf"
|
| 45 |
SLACK_URL = "https://join.slack.com/t/bitsinbio/shared_invite/zt-3dqigle2b-e0dEkfPPzzWL055j_8N_eQ"
|
| 46 |
TUTORIAL_URL = "https://huggingface.co/blog/ginkgo-datapoints/making-antibody-embeddings-and-predictions"
|
| 47 |
+
GITHUB_URL = "https://github.com/ginkgobioworks/abdev-benchmark"
|
| 48 |
|
| 49 |
# Input CSV file requirements
|
| 50 |
REQUIRED_COLUMNS: list[str] = [
|