Add agentic and tool usage tests

#388
by newdoria88 - opened

I think a test to see how a model performs for agentic tasks would be relevant since that's another main usage of LLMs beyond chatbots and RP.

Censored models sometimes even refuse to google something if it triggers their safety guardrails but many abliteraded models that weren't fine tuned to heal their lobotomy perform worst than their censored counterparts at tool usage and general agentic tasks, so it'd be useful to see not only how well a model does in the W/10 test but also how well it can handle tool calling and agentic tasks when required.

newdoria88 changed discussion title from Add agentic performance tests to Add agentic and tool usage performance tests
newdoria88 changed discussion title from Add agentic and tool usage performance tests to Add agentic and tool usage tests

Sign up or log in to comment