| license: bsd-3-clause | |
| tags: | |
| - kernel | |
| # Flash Attention 3 | |
| Flash Attention is a fast and memory-efficient implementation of the | |
| attention mechanism, designed to work with large models and long sequences. | |
| This is a Hugging Face compliant kernel build of Flash Attention. | |
| Original code here [https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention). | |
| Kernel source: https://github.com/huggingface/kernels-community/tree/main/flash-attn3 | |