Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

415

Add agentic and tool usage tests

#388

by newdoria88 - opened 24 days ago

Discussion

newdoria88

24 days ago

I think a test to see how a model performs for agentic tasks would be relevant since that's another main usage of LLMs beyond chatbots and RP.

Censored models sometimes even refuse to google something if it triggers their safety guardrails but many abliteraded models that weren't fine tuned to heal their lobotomy perform worst than their censored counterparts at tool usage and general agentic tasks, so it'd be useful to see not only how well a model does in the W/10 test but also how well it can handle tool calling and agentic tasks when required.

newdoria88 changed discussion title from Add agentic performance tests to Add agentic and tool usage performance tests 24 days ago

newdoria88 changed discussion title from Add agentic and tool usage performance tests to Add agentic and tool usage tests 24 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment