Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • 免费去水印

  • Log In
  • Sign Up
Alanox 's Collections
LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Upvote
-

  • Running on CPU Upgrade
    238

    MMLU-Pro Leaderboard

    🥇
    238

    More advanced and challenging multi-task evaluation


  • Running on CPU Upgrade
    566

    GAIA Leaderboard

    🦾
    566

    Submit and evaluate models on GAIA leaderboard

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets 免费Z-image图片生成 免费去水印 Vibevoice

🎉 Free Image Generator Now Available!

Totally Free + Zero Barriers + No Login Required