Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	π¦ WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
π Paper | π» GitHub | π€ HuggingFace | π¦ X | π¬ Discussion | βοΈ Version: V2 | # Models: {model_num} | Updated: {LAST_UPDATED}

