Update src/tasks_content.py
Browse files- src/tasks_content.py +6 -2
src/tasks_content.py
CHANGED
|
@@ -24,9 +24,13 @@ TASKS_DESCRIPTIONS = {
|
|
| 24 |
|
| 25 |
"ci_builds_repair": """# CI builds repair\n
|
| 26 |
|
| 27 |
-
Our CI
|
| 28 |
|
| 29 |
-
We use
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
For further details on the dataset and the baselines from the ποΈ Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines).
|
| 32 |
""",
|
|
|
|
| 24 |
|
| 25 |
"ci_builds_repair": """# CI builds repair\n
|
| 26 |
|
| 27 |
+
Our CI Builds Repair benchmark π€ [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair) includes 77 data points.
|
| 28 |
|
| 29 |
+
We use Pass@1 metric for CI repair.
|
| 30 |
+
Models can be evaluated in three task types:
|
| 31 |
+
* `full` β *no* ground truth diffs are used for model evaluation;
|
| 32 |
+
* `oracle: files` β ground truth diffs are used to select files that should be corrected to fix the issue;
|
| 33 |
+
* `oracle: files, lines` β ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;
|
| 34 |
|
| 35 |
For further details on the dataset and the baselines from the ποΈ Long Code Arena team, refer to the `ci-builds-repair` directory in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines).
|
| 36 |
""",
|