MMLU-Pro Leaderboard
🥇
238
More advanced and challenging multi-task evaluation
This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers
Totally Free + Zero Barriers + No Login Required