Activation

Activation is a python package that contains custom CUDA-based activation kernels, primarily targeting AMD GPUs.

  • Currently implemented
    • PolyNorm

    • RMSNorm

    • FusedAddRMSNorm

      A fused operator that combines residual addition (x + residual) with RMSNorm in a single kernel.

      • Instead of:

        y = x + residual
        hidden_state = rms_norm(y, weight, eps)
        out = y + some_op(hidden_state) 
        
      • Fused as:

        hidden_state, y = fused_add_rms_norm(x, residual, weight, eps)
        out = y + some_op(hidden_state)
        
    • FusedMulPolyNorm

      A fused operator that combines PolyNorm with an element-wise multiplication by a Tensor.

      • Instead of:

        y = poly_norm(x, weight, bias, eps)
        out = y * a
        
      • Fused as:

        out = fused_mul_poly_norm(x, a, weight, bias, eps)
        

Usage

import torch
from kernels import get_kernel

activation = get_kernel("motif-technologies/activation")

torch.set_default_device("cuda")
poly_norm = activation.layers.PolyNorm(eps=1e-6)
x = torch.randn(10, 10)

print(poly_norm(x))

Performance

  • Test cases are from the Motif LLM
  • The results can be reproduced using the provided benchmarking tools.
  • For details on how to use the benchmarking tools, please refer to the benchmarks README.
  • The benchmark results may show fluctuations, especially in the backward pass and when the dimension size is small.

RMSNorm

H100 Results

Forward Performance

RMSNorm Forward Performance

Backward Performance

RMSNorm Backward Performance

MI250 Results

Forward Performance

RMSNorm Forward Performance

Backward Performance

RMSNorm Backward Performance


FusedAddRMSNorm

For fusion case performance, the non-fused baseline was implemented with our custom kernels.

H100 Results

Forward Performance

FusedAddRMSNorm Forward Performance

Backward Performance

FusedAddRMSNorm Backward Performance

MI250 Results

Forward Performance

FusedAddRMSNorm Forward Performance

Backward Performance

FusedAddRMSNorm Backward Performance


PolyNorm

H100 Results

Forward Performance

PolyNorm Forward Performance

Backward Performance

PolyNorm Backward Performance

MI250 Results

Forward Performance

PolyNorm Forward Performance

Backward Performance

PolyNorm Backward Performance


FusedMulPolyNorm

For fusion case performance, the non-fused baseline was implemented with our custom kernels.

H100 Results

Forward Performance

FusedMulPolyNorm Forward Performance

Backward Performance

FusedMulPolyNorm Backward Performance

MI250 Results

Forward Performance

FusedMulPolyNorm Forward Performance

Backward Performance

FusedMulPolyNorm Backward Performance

Pre-commit Hooks

This project uses pre-commit to automatically check and format code before commits.

Setup

  1. Install pre-commit:

    pip install pre-commit
    
  2. Install the git hooks:

   pre-commit install

Once installed, the configured hooks will run automatically on each commit.

Included Hooks

The following tools are run via pre-commit:

  • yapf – Python code formatter
  • typos – Spell checker for common typos
  • isort – Organizes and sorts Python imports
  • clang-format – Formats C++/CUDA code (--style=file)
  • pymarkdown – Lints and auto-fixes Markdown files
  • actionlint – Validates GitHub Actions workflows

Usage

  • Run all checks on the entire codebase:

    pre-commit run --all-files
    
  • Run a specific hook (example: isort):

  pre-commit run isort --all-files
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support