Activation

Activation is a python package that contains custom CUDA-based activation kernels, primarily targeting AMD GPUs.

Currently implemented

PolyNorm
RMSNorm

FusedAddRMSNorm

A fused operator that combines residual addition (x + residual) with RMSNorm in a single kernel.

Instead of:

y = x + residual
hidden_state = rms_norm(y, weight, eps)
out = y + some_op(hidden_state)

Fused as:

hidden_state, y = fused_add_rms_norm(x, residual, weight, eps)
out = y + some_op(hidden_state)

FusedMulPolyNorm

A fused operator that combines PolyNorm with an element-wise multiplication by a Tensor.
- Instead of:
```
y = poly_norm(x, weight, bias, eps)
out = y * a
```
- Fused as:
```
out = fused_mul_poly_norm(x, a, weight, bias, eps)
```

Usage

import torch
from kernels import get_kernel

activation = get_kernel("motif-technologies/activation")

torch.set_default_device("cuda")
poly_norm = activation.layers.PolyNorm(eps=1e-6)
x = torch.randn(10, 10)

print(poly_norm(x))

Performance

Test cases are from the Motif LLM
The results can be reproduced using the provided benchmarking tools.
For details on how to use the benchmarking tools, please refer to the benchmarks README.
The benchmark results may show fluctuations, especially in the backward pass and when the dimension size is small.