-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 28 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
Collections
Discover the best community collections!
Collections including paper arxiv:2405.12981
-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
-
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 73 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 89 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 47
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 28 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
-
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 73 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 89 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 47
-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47