FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance Paper • 2603.12146 • Published 27 days ago • 5
Can Vision-Language Models Solve the Shell Game? Paper • 2603.08436 • Published about 1 month ago • 39
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published 28 days ago • 25
CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization Paper • 2603.06449 • Published Mar 6 • 6
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding Paper • 2601.07290 • Published Jan 12 • 7
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding Paper • 2601.07290 • Published Jan 12 • 7
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding Paper • 2601.07290 • Published Jan 12 • 7
VideoLoom Collection Model Zoo for VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding • 3 items • Updated Jan 13 • 1
VideoLoom Collection Model Zoo for VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding • 3 items • Updated Jan 13 • 1
VideoLoom Collection Model Zoo for VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding • 3 items • Updated Jan 13 • 1