Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
VoxCPM
Log In
Sign Up
bird-of-paradise
/
deepseek-mla
like
10
Text Generation
Transformers
PyTorch
English
deepseek-mla
attention-mechanism
mla
efficient-attention
arxiv:
2405.04434
License:
mit
Model card
Files
Files and versions
xet
Community
2
Use this model
f628f42
deepseek-mla
759 kB
2 contributors
History:
2 commits
bird-of-paradise
Update README.md: clarify this is an attention implementation, not a trained model
f628f42
8 months ago
assets
Initial commit: DeepSeek Multi-Latent Attention implementation
8 months ago
insights
Initial commit: DeepSeek Multi-Latent Attention implementation
8 months ago
src
Initial commit: DeepSeek Multi-Latent Attention implementation
8 months ago
.DS_Store
Safe
6.15 kB
Initial commit: DeepSeek Multi-Latent Attention implementation
8 months ago
CONTRIBUTING.md
Safe
0 Bytes
Initial commit: DeepSeek Multi-Latent Attention implementation
8 months ago
README.md
3.56 kB
Update README.md: clarify this is an attention implementation, not a trained model
8 months ago