Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.12981

Transformers & Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 82
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 28
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 114
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159

efficient inference

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Paper • 2402.13720 • Published Feb 21, 2024 • 7
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 99
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 73
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 89
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 47

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 49
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47

Transformers & Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 82
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 28
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 114
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 99
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 73
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 89
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 47

efficient inference

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Paper • 2402.13720 • Published Feb 21, 2024 • 7
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 49
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略