Ankush Singal
Andyrasika
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
about 7 hours ago
Memory
upvoted
a
collection
about 8 hours ago
Knowledge Graph
updated
a collection
about 8 hours ago
Knowledge Graph
Organizations
Computer-vision
Evaluations
Agents
Prompt-collection
Fine-Tuning
Fine-Tuning
-
Direct Judgement Preference Optimization
Paper • 2409.14664 • Published -
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper • 2411.02397 • Published • 23 -
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Paper • 2410.08876 • Published -
Efficient Streaming Language Models with Attention Sinks
Paper • 2309.17453 • Published • 14
RAG articles
This collection is meant for RAG articles 1. Let your LLM generate a few tokens https://www.arxiv.org/abs/2412.11536
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Paper • 2406.14550 • Published • 4 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Meta Prompting for AGI Systems
Paper • 2311.11482 • Published • 4 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12
Time series
this collection is for time series articles
Reinforcement Learning
This collection is for papers in Reinforcement Learning
-
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning
Paper • 2407.15815 • Published • 14 -
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper • 2411.15124 • Published • 67 -
Meta-RL Induces Exploration in Language Agents
Paper • 2512.16848 • Published • 8
Stable Diffusion
Papers related to stable diffusion
-
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
Paper • 2408.03178 • Published • 40 -
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Paper • 2408.17131 • Published • 11 -
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
Paper • 2412.09262 • Published • 1 -
SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
Paper • 2507.15595 • Published • 5
Synthetic Datasets
Robotics
Contextual Engineering
Reasoning-Model
Embedding
computation
this is for Mixture of XXX
Ankush Collection
Transformer Articles
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
multimodal
this collection is for multimodal papers
-
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Paper • 2407.10387 • Published • 8 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 50 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159
Audio
This collection is dedicate to Audio Transformers
Transformers
This collection is for Transformer Articles
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52
cool models
List of coll models
-
alibaba-damo/mgp-str-base
Image-to-Text • 0.1B • Updated • 10.8k • 65 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54
Knowledge Graph
Robotics
Computer-vision
Contextual Engineering
Evaluations
Reasoning-Model
Agents
Embedding
Prompt-collection
computation
this is for Mixture of XXX
Fine-Tuning
Fine-Tuning
-
Direct Judgement Preference Optimization
Paper • 2409.14664 • Published -
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper • 2411.02397 • Published • 23 -
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Paper • 2410.08876 • Published -
Efficient Streaming Language Models with Attention Sinks
Paper • 2309.17453 • Published • 14
Ankush Collection
Transformer Articles
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
RAG articles
This collection is meant for RAG articles 1. Let your LLM generate a few tokens https://www.arxiv.org/abs/2412.11536
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Paper • 2406.14550 • Published • 4 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Meta Prompting for AGI Systems
Paper • 2311.11482 • Published • 4 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12
multimodal
this collection is for multimodal papers
-
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Paper • 2407.10387 • Published • 8 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 50 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159
Time series
this collection is for time series articles
Audio
This collection is dedicate to Audio Transformers
Reinforcement Learning
This collection is for papers in Reinforcement Learning
-
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning
Paper • 2407.15815 • Published • 14 -
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper • 2411.15124 • Published • 67 -
Meta-RL Induces Exploration in Language Agents
Paper • 2512.16848 • Published • 8
Transformers
This collection is for Transformer Articles
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52
Stable Diffusion
Papers related to stable diffusion
-
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
Paper • 2408.03178 • Published • 40 -
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Paper • 2408.17131 • Published • 11 -
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
Paper • 2412.09262 • Published • 1 -
SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
Paper • 2507.15595 • Published • 5
cool models
List of coll models
-
alibaba-damo/mgp-str-base
Image-to-Text • 0.1B • Updated • 10.8k • 65 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54
Synthetic Datasets