Spaces:

SWE-Arena
/

SWE-Model-Arena

Running

zhiminy commited on 11 days ago

Commit

0f07314

1 Parent(s): da05d30

use 20b as guardrail

Files changed (2) hide show

README.md CHANGED Viewed

@@ -25,6 +25,7 @@ Welcome to **SWE-Model-Arena**, an open-source platform designed for evaluating
   - Community detection: Newman modularity score
   - Consistency score: Quantify model determinism and reliability through self-play matches
 - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
 ## Why SWE-Model-Arena?

   - Community detection: Newman modularity score
   - Consistency score: Quantify model determinism and reliability through self-play matches
 - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
+- **Intelligent Request Filtering**: Employs `GPT-OSS-20B` as a guardrail to automatically filter out non-software-engineering-related requests, ensuring focused and relevant evaluations.
 ## Why SWE-Model-Arena?

app.py CHANGED Viewed

@@ -866,7 +866,7 @@ with gr.Blocks(js=clickable_links_js) as app:
         def guardrail_check_se_relevance(user_input):
             """
-            Use gpt-5-nano to check if the user input is SE-related.
             Return True if it is SE-related, otherwise False.
             """
             # Example instructions for classification — adjust to your needs
@@ -883,7 +883,7 @@ with gr.Blocks(js=clickable_links_js) as app:
             try:
                 # Make the chat completion call
                 response = openai_client.chat.completions.create(
-                    model="gpt-5-nano", messages=[system_message, user_message]
                 )
                 classification = response.choices[0].message.content.strip().lower()
                 # Check if the LLM responded with 'Yes'

         def guardrail_check_se_relevance(user_input):
             """
+            Use gpt-oss-20b to check if the user input is SE-related.
             Return True if it is SE-related, otherwise False.
             """
             # Example instructions for classification — adjust to your needs
             try:
                 # Make the chat completion call
                 response = openai_client.chat.completions.create(
+                    model="gpt-oss-20b", messages=[system_message, user_message]
                 )
                 classification = response.choices[0].message.content.strip().lower()
                 # Check if the LLM responded with 'Yes'