Benchmark Runner

This script benchmarks forward/backward performance of several operations (rms, add_rms, poly, mul_poly). Results can be saved as CSV files or plots.

Note

To run the benchmarks, you must select the appropriate Torch version along with the corresponding CUDA/ROCm build from within the build directory.

Example:
export PYTHONPATH=$PYTHONPATH:<YOUR_PATH>/activation/build/torch27-cxx11-cu128-x86_64-linux

Usage

python main.py --case <CASE> [--plot] [--save-path <DIR>]

--case (required): one of rms, add_rms, poly, mul_poly
--plot: save plots instead of CSVs
--save-path: output directory (default: ./configs/)

Examples

python main.py --case add_rms --save-path ./results/
python main.py --case poly --plot --save-path ./plots/

Output

CSV: <case>-fwd-perf.csv, <case>-bwd-perf.csv
Plots: plot_<case>-fwd-perf.png, plot_<case>-bwd-perf.png