Spaces:
Running
Running
Rename SE-Arena to SWE-Arena in README
Browse files
README.md
CHANGED
|
@@ -1,19 +1,19 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: 🛠️
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 5.
|
| 8 |
app_file: app.py
|
| 9 |
hf_oauth: true
|
| 10 |
pinned: false
|
| 11 |
short_description: The chatbot arena for software engineering
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# SWE
|
| 15 |
|
| 16 |
-
Welcome to **SWE
|
| 17 |
|
| 18 |
## Key Features
|
| 19 |
|
|
@@ -26,9 +26,9 @@ Welcome to **SWE Arena**, an open-source platform designed for evaluating softwa
|
|
| 26 |
- **Consistency score**: Quantify model determinism and reliability through self-play matches
|
| 27 |
- **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
|
| 28 |
|
| 29 |
-
## Why SWE
|
| 30 |
|
| 31 |
-
Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Arena) often don't address the complex, iterative nature of SE tasks. SWE
|
| 32 |
|
| 33 |
- Supporting context-rich, multi-turn evaluations to capture iterative workflows
|
| 34 |
- Integrating repository-level context through RepoChat to simulate real-world development scenarios
|
|
@@ -51,7 +51,7 @@ Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Ar
|
|
| 51 |
|
| 52 |
### Usage
|
| 53 |
|
| 54 |
-
1. Navigate to the [SWE
|
| 55 |
2. Sign in with your Hugging Face account
|
| 56 |
3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
|
| 57 |
4. Engage in multi-round interactions and vote on model performance
|
|
@@ -66,7 +66,7 @@ We welcome contributions from the community! Here's how you can help:
|
|
| 66 |
|
| 67 |
## Privacy Policy
|
| 68 |
|
| 69 |
-
Your interactions are anonymized and used solely for improving SWE
|
| 70 |
|
| 71 |
## Future Plans
|
| 72 |
|
|
|
|
| 1 |
---
|
| 2 |
+
title: SWE-Arena
|
| 3 |
emoji: 🛠️
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
hf_oauth: true
|
| 10 |
pinned: false
|
| 11 |
short_description: The chatbot arena for software engineering
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
|
| 15 |
|
| 16 |
+
Welcome to **SWE-Arena**, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SWE-Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.
|
| 17 |
|
| 18 |
## Key Features
|
| 19 |
|
|
|
|
| 26 |
- **Consistency score**: Quantify model determinism and reliability through self-play matches
|
| 27 |
- **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
|
| 28 |
|
| 29 |
+
## Why SWE-Arena?
|
| 30 |
|
| 31 |
+
Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Arena) often don't address the complex, iterative nature of SE tasks. SWE-Arena fills critical gaps by:
|
| 32 |
|
| 33 |
- Supporting context-rich, multi-turn evaluations to capture iterative workflows
|
| 34 |
- Integrating repository-level context through RepoChat to simulate real-world development scenarios
|
|
|
|
| 51 |
|
| 52 |
### Usage
|
| 53 |
|
| 54 |
+
1. Navigate to the [SWE-Arena platform](https://huggingface.co/spaces/SE-Arena/Software-Engineering-Arena)
|
| 55 |
2. Sign in with your Hugging Face account
|
| 56 |
3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
|
| 57 |
4. Engage in multi-round interactions and vote on model performance
|
|
|
|
| 66 |
|
| 67 |
## Privacy Policy
|
| 68 |
|
| 69 |
+
Your interactions are anonymized and used solely for improving SWE-Arena and FM benchmarking. By using SWE-Arena, you agree to our Terms of Service.
|
| 70 |
|
| 71 |
## Future Plans
|
| 72 |
|