CyberOps AI: Red, Blue, Purple & Black Hat Defense Collection A cutting-edge collection of AI-driven models, datasets, and spaces dedicated to advancing the full spectrum of cybersecurity operations. • 6 items • Updated Feb 2, 2025 • 3
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 48
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published Dec 15, 2025 • 52
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published Dec 14, 2025 • 14