TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published 5 days ago • 21
view reply hi @naufalso ! Lighteval now suport inspect-ai as a backend, so everything supported by inspect is integrrated in lighteval 🔥
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 8 days ago • 59
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 8 days ago • 59