TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 24 days ago • 66
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions Paper • 2510.10666 • Published Oct 12 • 27
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9 • 70
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Paper • 2509.26346 • Published Sep 30 • 18
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published Sep 26 • 20
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26 • 25
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 76
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published May 22 • 41
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 53
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 43
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 37