Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published Jul 20, 2025 • 21
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23, 2025 • 23
MMBench: Is Your Multi-modal Model an All-around Player? Paper • 2307.06281 • Published Jul 12, 2023 • 5
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 9
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1, 2024 • 12
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Paper • 2203.07845 • Published Mar 15, 2022
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 42
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper • 2310.08588 • Published Oct 12, 2023 • 38
Otter: A Multi-Modal Model with In-Context Instruction Tuning Paper • 2305.03726 • Published May 5, 2023 • 6