UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper • 2602.14178 • Published 1 day ago • 2
BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents Paper • 2602.12876 • Published 4 days ago • 2
LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models Paper • 2602.14147 • Published 1 day ago • 1
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents Paper • 2602.14234 • Published 1 day ago • 4
Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings Paper • 2602.13823 • Published 3 days ago • 3
RISE: Self-Improving Robot Policy with Compositional World Model Paper • 2602.11075 • Published 5 days ago • 27
Thinking with Drafting: Optical Decompression via Logical Reconstruction Paper • 2602.11731 • Published 5 days ago • 32
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper • 2602.13191 • Published 3 days ago • 21
RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models Paper • 2602.12628 • Published 4 days ago • 9
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published 4 days ago • 13
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 6 days ago • 202
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents Paper • 2602.12984 • Published 4 days ago • 4
ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning Paper • 2602.11236 • Published 6 days ago • 10
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 8 days ago • 40
Light4D: Training-Free Extreme Viewpoint 4D Video Relighting Paper • 2602.11769 • Published 5 days ago • 2
Code2Worlds: Empowering Coding LLMs for 4D World Generation Paper • 2602.11757 • Published 5 days ago • 3
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning Paper • 2602.04315 • Published 13 days ago • 1
On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs Paper • 2602.12506 • Published 4 days ago • 3
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions Paper • 2602.13013 • Published 4 days ago • 7
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution Paper • 2602.12684 • Published 4 days ago • 3