Spaces:

JetBrains-Research
/

long-code-arena

Running

Areyde commited on Jun 5, 2024

Commit

4b54f71

verified ·

1 Parent(s): 2765cea

Update src/tasks_content.py

Files changed (1) hide show

src/tasks_content.py CHANGED Viewed

@@ -24,11 +24,11 @@ TASKS_DESCRIPTIONS = {
     "ci_builds_repair": """# CI builds repair\n
-        Our CI Builds Repair benchmark 🤗 [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair)
-        includes manually curated and assessed 77 data points coming from 32 Python repositories.
-        We use Pass@1 metric for CI repair.
-        Models can be evaluated in three task types:
         * `full` – **no** ground truth diffs are used for model evaluation;
         * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
         * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;

     "ci_builds_repair": """# CI builds repair\n
+        Our CI builds repair benchmark 🤗 [JetBrains-Research/lca-ci-builds-repair](https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair)
+        includes 77 manually curated and assessed data points coming from 32 Python repositories, which are used to make a model fix a failed build.
+        We use the `Pass@1` metric for CI repair.
+        Models can be evaluated in three types of tasks:
         * `full` – **no** ground truth diffs are used for model evaluation;
         * `oracle: files` – ground truth diffs are used to select files that should be corrected to fix the issue;
         * `oracle: files, lines` – ground truth diffs are used to select files and code blocks that should be corrected to fix the issue;