LuxDiT: Lighting Estimation with Video Diffusion Transformer Paper • 2509.03680 • Published 5 days ago • 6
view article Article Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders By orionweller and 5 others • Jul 16 • 67
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model By tomaarsen and 5 others • 5 days ago • 174
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 5 items • Updated 8 days ago • 108
view article Article Make your ZeroGPU Spaces go brrr with PyTorch ahead-of-time compilation By cbensimon and 3 others • 7 days ago • 44
Qwen Image Edit Accelerated Inference Collection Creative applications and accelerated demos with QwenImageEdit • 10 items • Updated 4 days ago • 4
MobileCLIP2 Collection MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 31 items • Updated 6 days ago • 47
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels Paper • 2508.17437 • Published 19 days ago • 35
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation Paper • 2508.15774 • Published 18 days ago • 20
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published 13 days ago • 39
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation Paper • 2508.17472 • Published 15 days ago • 26
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation Paper • 2508.18032 • Published 14 days ago • 40