HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 4 days ago • 81
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 4 days ago • 137
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 11 days ago • 109
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 17 days ago • 93
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published 20 days ago • 85
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 21 days ago • 143
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published Mar 3 • 145
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 86
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published Mar 9 • 40
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios Paper • 2602.22638 • Published Feb 26 • 107
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published Mar 3 • 87
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 103
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 64
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published Feb 3 • 59