LLM Evaluation Benchmarks - a Alanox Collection

Alanox 's Collections

LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Running on CPU Upgrade

238

MMLU-Pro Leaderboard

🥇

238

More advanced and challenging multi-task evaluation
Running on CPU Upgrade

566

GAIA Leaderboard

🦾

566

Submit and evaluate models on GAIA leaderboard