AI-paper - a shankars Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

shankars 's Collections

AI-paper

updated 3 days ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 25 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 24 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 19 days ago • 48
MultiRef: Controllable Image Generation with Multiple Visual References

Paper • 2508.06905 • Published 29 days ago • 21
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Paper • 2508.14041 • Published 19 days ago • 57
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 123
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Paper • 2508.12800 • Published 20 days ago • 5
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

Paper • 2508.11548 • Published 23 days ago • 5
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge

Paper • 2508.08777 • Published 26 days ago • 15
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published 26 days ago • 16
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published 18 days ago • 42
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Paper • 2508.14111 • Published 21 days ago • 32
RynnEC: Bringing MLLMs into Embodied World

Paper • 2508.14160 • Published 19 days ago • 18
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 186
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 116
A Survey on Large Language Model Benchmarks

Paper • 2508.15361 • Published 17 days ago • 18
Deep Think with Confidence

Paper • 2508.15260 • Published 18 days ago • 81
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21 • 75
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Paper • 2406.14562 • Published Jun 20, 2024 • 29
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 66
Thinking with Generated Images

Paper • 2505.22525 • Published May 28 • 15
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Paper • 2505.13444 • Published May 19 • 16
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 82
ComposeAnything: Composite Object Priors for Text-to-Image Generation

Paper • 2505.24086 • Published May 30 • 5
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 86
Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 57
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 48
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Paper • 2403.12884 • Published Mar 19, 2024 • 1
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography

Paper • 2504.10090 • Published Apr 14
Visual Programming: Compositional visual reasoning without training

Paper • 2211.11559 • Published Nov 18, 2022 • 1
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Paper • 2408.02210 • Published Aug 5, 2024 • 9
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Paper • 2412.18072 • Published Dec 24, 2024 • 20
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 17 days ago • 243
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 111
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 56
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
LLM Inference Unveiled: Survey and Roofline Model Insights

Paper • 2402.16363 • Published Feb 26, 2024 • 2
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

Paper • 2504.11750 • Published Apr 16
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 18
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18
Personalized Image Generation with Deep Generative Models: A Decade Survey

Paper • 2502.13081 • Published Feb 18
Diffusion Models: A Comprehensive Survey of Methods and Applications

Paper • 2209.00796 • Published Sep 2, 2022
An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 64
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published Feb 13 • 21
A survey of Generative AI Applications

Paper • 2306.02781 • Published Jun 5, 2023
Text-to-image Diffusion Models in Generative AI: A Survey

Paper • 2303.07909 • Published Mar 14, 2023
Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Paper • 2501.06322 • Published Jan 10 • 1
Multi-Agent Collaboration via Evolving Orchestration

Paper • 2505.19591 • Published May 26 • 1
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Paper • 2412.04440 • Published Dec 5, 2024 • 22
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

Paper • 2506.12508 • Published Jun 14 • 1
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Paper • 2407.07061 • Published Jul 9, 2024 • 28
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 26
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19, 2024 • 27
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Paper • 2411.16657 • Published Nov 25, 2024 • 20
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Paper • 2411.10818 • Published Nov 16, 2024 • 27
VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 47
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices

Paper • 2504.03664 • Published Mar 15
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference

Paper • 2503.03777 • Published Mar 4
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs

Paper • 2503.16163 • Published Mar 20
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18 • 12
Seesaw: High-throughput LLM Inference via Model Re-sharding

Paper • 2503.06433 • Published Mar 9
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints

Paper • 2504.09345 • Published Apr 12
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 286
MV-RAG: Retrieval Augmented Multiview Diffusion

Paper • 2508.16577 • Published 16 days ago • 36
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published 13 days ago • 40
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published 15 days ago • 15
Explain Before You Answer: A Survey on Compositional Visual Reasoning

Paper • 2508.17298 • Published 14 days ago • 4
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published 16 days ago • 132
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published 16 days ago • 30
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation

Paper • 2508.15774 • Published 17 days ago • 19
Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published 11 days ago • 79
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published 11 days ago • 28
AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Paper • 2508.20088 • Published 11 days ago • 20
MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

Paper • 2508.19527 • Published 12 days ago • 9
Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference

Paper • 2508.19559 • Published 12 days ago • 5
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published 10 days ago • 30
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published 10 days ago • 97
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published 10 days ago • 85
AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published 11 days ago • 37
Dress&Dance: Dress up and Dance as You Like It - Technical Preview

Paper • 2508.21070 • Published 10 days ago • 5
ROSE: Remove Objects with Side Effects in Videos

Paper • 2508.18633 • Published 13 days ago • 7
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 10 days ago • 72
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published 13 days ago • 197
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 10 days ago • 104
AHELM: A Holistic Evaluation of Audio-Language Models

Paper • 2508.21376 • Published 9 days ago • 9
Morae: Proactively Pausing UI Agents for User Choices

Paper • 2508.21456 • Published 9 days ago • 5
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 9 days ago • 12
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published 10 days ago • 18
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Paper • 2508.17677 • Published 14 days ago • 14
CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published 19 days ago • 7
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published 10 days ago • 132
Continual Learning for Large Language Models: A Survey

Paper • 2402.01364 • Published Feb 2, 2024 • 1
Continual Learning with Pre-Trained Models: A Survey

Paper • 2401.16386 • Published Jan 29, 2024 • 1
Continual Learning: Applications and the Road Forward

Paper • 2311.11908 • Published Nov 20, 2023 • 1
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 5 days ago • 156
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 5 days ago • 76
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Paper • 2508.21496 • Published 9 days ago • 53
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 7 days ago • 60
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published 6 days ago • 42
GenCompositor: Generative Video Compositing with Diffusion Transformer

Paper • 2509.02460 • Published 5 days ago • 22
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Paper • 2509.01644 • Published 6 days ago • 26
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published 8 days ago • 12
From Editor to Dense Geometry Estimator

Paper • 2509.04338 • Published 3 days ago • 74
Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings

Paper • 2508.18733 • Published 12 days ago • 4
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published 3 days ago • 54

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略