Lighteval documentation
Lighteval
Lighteval
🤗 Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up.
Key Features
🚀 Multi-Backend Support
Evaluate your models using the most popular and efficient inference backends:
transformers
: Evaluate models on CPU or one or more GPUs using 🤗 Acceleratenanotron
: Evaluate models in distributed settings using ⚡️ Nanotronvllm
: Evaluate models on one or more GPUs using 🚀 VLLMcustom
: Evaluate custom models (can be anything)sglang
: Evaluate models using SGLang as backendinference-endpoint
: Evaluate models using Hugging Face’s Inference Endpoints APItgi
: Evaluate models using 🔗 Text Generation Inference running locallylitellm
: Evaluate models on any compatible API using LiteLLMinference-providers
: Evaluate models using HuggingFace’s inference providers as backend**: Distributed training and evaluation
📊 Comprehensive Evaluation
- Extensive Task Library: 1000s pre-built evaluation tasks
- Custom Task Creation: Build your own evaluation tasks
- Flexible Metrics: Support for custom metrics and scoring
- Detailed Analysis: Sample-by-sample results for deep insights
🔧 Easy Customization
Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics.
☁️ Seamless Integration
Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.
Quick Start
Installation
pip install lighteval
Basic Usage
# Evaluate a model using Transformers backend
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0"
Save Results
# Save locally
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0" \
--output-dir ./results
# Push to Hugging Face Hub
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0" \
--push-to-hub \
--results-org your-username