|
--- |
|
tags: |
|
- kernel |
|
license: apache-2.0 |
|
--- |
|
|
|
# Activation |
|
|
|
Activation is a python package that contains custom CUDA-based activation kernels, primarily targeting AMD GPUs. |
|
|
|
- Currently implemented |
|
- [PolyNorm](https://arxiv.org/html/2411.03884v1) |
|
- [RMSNorm](https://docs.pytorch.org/docs/stable/generated/torch.nn.RMSNorm.html) |
|
- **FusedAddRMSNorm** |
|
|
|
A fused operator that combines **residual addition** (`x + residual`) with **RMSNorm** in a single kernel. |
|
- Instead of: |
|
|
|
```python |
|
y = x + residual |
|
hidden_state = rms_norm(y, weight, eps) |
|
out = y + some_op(hidden_state) |
|
``` |
|
|
|
- Fused as: |
|
|
|
```python |
|
hidden_state, y = fused_add_rms_norm(x, residual, weight, eps) |
|
out = y + some_op(hidden_state) |
|
``` |
|
|
|
- **FusedMulPolyNorm** |
|
|
|
A fused operator that combines **PolyNorm** with an **element-wise multiplication** by a Tensor. |
|
- Instead of: |
|
|
|
```python |
|
y = poly_norm(x, weight, bias, eps) |
|
out = y * a |
|
``` |
|
|
|
- Fused as: |
|
|
|
```python |
|
out = fused_mul_poly_norm(x, a, weight, bias, eps) |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from kernels import get_kernel |
|
|
|
activation = get_kernel("motif-technologies/activation") |
|
|
|
torch.set_default_device("cuda") |
|
poly_norm = activation.layers.PolyNorm(eps=1e-6) |
|
x = torch.randn(10, 10) |
|
|
|
print(poly_norm(x)) |
|
``` |
|
|
|
## Performance |
|
- Test cases are from the Motif LLM |
|
- The results can be reproduced using the provided benchmarking tools. |
|
- For details on how to use the benchmarking tools, please refer to the [benchmarks README](./benchmarks/README.md). |
|
- The benchmark results may show fluctuations, especially in the backward pass and when the dimension size is small. |
|
|
|
### RMSNorm |
|
|
|
#### H100 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
#### MI250 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
--- |
|
|
|
### FusedAddRMSNorm |
|
|
|
> [!NOTE] |
|
> For fusion case performance, the **non-fused baseline** was implemented with our **custom kernels**. |
|
|
|
#### H100 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
#### MI250 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
--- |
|
|
|
### PolyNorm |
|
|
|
#### H100 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
#### MI250 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
--- |
|
|
|
### FusedMulPolyNorm |
|
|
|
> [!NOTE] |
|
> For fusion case performance, the **non-fused baseline** was implemented with our **custom kernels**. |
|
|
|
#### H100 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
#### MI250 Results |
|
|
|
<details> |
|
<summary>Forward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Backward Performance</summary> |
|
|
|
 |
|
|
|
</details> |
|
|
|
## Pre-commit Hooks |
|
|
|
This project uses [pre-commit](https://pre-commit.com/) to automatically check and format code before commits. |
|
|
|
### Setup |
|
|
|
1. Install pre-commit: |
|
|
|
```bash |
|
pip install pre-commit |
|
``` |
|
|
|
2. Install the git hooks: |
|
|
|
```bash |
|
pre-commit install |
|
``` |
|
|
|
Once installed, the configured hooks will run automatically on each commit. |
|
|
|
### Included Hooks |
|
|
|
The following tools are run via pre-commit: |
|
|
|
- **[yapf](https://github.com/google/yapf)** – Python code formatter |
|
- **[typos](https://github.com/crate-ci/typos)** – Spell checker for common typos |
|
- **[isort](https://github.com/PyCQA/isort)** – Organizes and sorts Python imports |
|
- **[clang-format](https://clang.llvm.org/docs/ClangFormat.html)** – Formats C++/CUDA code (`--style=file`) |
|
- **[pymarkdown](https://github.com/jackdewinter/pymarkdown)** – Lints and auto-fixes Markdown files |
|
- **[actionlint](https://github.com/rhysd/actionlint)** – Validates GitHub Actions workflows |
|
|
|
### Usage |
|
|
|
- Run all checks on the entire codebase: |
|
|
|
```bash |
|
pre-commit run --all-files |
|
``` |
|
|
|
- Run a specific hook (example: isort): |
|
|
|
```bash |
|
pre-commit run isort --all-files |
|
``` |
|
|