Visual Autoregressive Modeling for Instruction-Guided Image Editing Paper • 2508.15772 • Published 18 days ago • 9 • 3
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 418 • 7
Making Images Real Again: A Comprehensive Survey on Deep Image Composition Paper • 2106.14490 • Published Jun 28, 2021 • 1
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Paper • 2502.18461 • Published Feb 25 • 17 • 3
UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation Paper • 2508.05399 • Published Aug 7 • 16 • 4
UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation Paper • 2508.05399 • Published Aug 7 • 16 • 4
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 123 • 8
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models Paper • 2409.04701 • Published Sep 7, 2024 • 1 • 1
ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? Paper • 2508.13680 • Published 21 days ago • 5 • 3
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models Paper • 2508.13491 • Published 21 days ago • 58 • 3
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published 27 days ago • 54 • 2
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning Paper • 2505.14362 • Published May 20 • 2 • 2
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 79 • 14
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published 18 days ago • 244 • 5
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification Paper • 2502.14565 • Published Feb 20 • 1