--- license: apache-2.0 metrics: - accuracy base_model: - mistralai/Mixtral-8x7B-Instruct-v0.1 --- # Quark Team FP8 Mixtral-8x7B Model Overview ## Model Information For MLPerf - **Model Name**: Mixtral-7x8b - **Version**: MLPerf v5.1 - **Commit**: Close Division Commit - **Supported Hardware Microarchitecture**: AMD MI300/MI325 - **Transformers**: 4.46.3 - **Quark:** [0.9](https://quark.docs.amd.com/latest/install.html) ## Calibration Dataset The calibration dataset consists of **1024 mixed datasets** provided by MLPerf, which includes: - **325 GSM8k samples** - **325 MBXP samples** - **374 OpenOcra samples** ## Quantized Tensors The following tensors are quantized in each decoder: - **Expert MLP Inputs and Weights** (excluding the router) - **Linear qkv Inputs and Weight** - **KV Cache Entries** ## Ignored Layers The following layers are ignored during quantization: - `*.gate` - `*.o_proj` - `lm_head` ## Algorithms AutoSmoothQuant algorithm is applied in weight-activation quantization for better performance. ## Quantization Scripts ``` cd examples/torch/language_modeling/llm_ptq/ MODEL_DIR="mistralai/Mixtral-8x7B-Instruct-v0.1" DATASET="./mlperf_data/mixtral_8x7b%2F2024.06.06_mixtral_15k_calibration_v4.pkl" OUTPUT_DIR="amd/Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3" python3 quantize_quark.py --model_dir "${MODEL}" \ --output_dir "${OUTPUT_DIR}" \ --dataset "${DATASET}" \ --data_type float16 \ --multi_gpu \ --quant_scheme w_fp8_a_fp8 \ --kv_cache_dtype fp8 \ --num_calib_data 1024 \ --seq_len 1024 \ --min_kv_scale 1.0 \ --model_export hf_format \ --custom_mode fp8 \ --quant_algo autosmoothquant \ --exclude_layers "lm_head" "*.gate" ``` # Model Performance Comparison | Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) | |-----------------------|--------------------|-----------------------| | **GSM8K (Math)** | 73.66 | 73.18 (99.34%) | | **Open Orca (Chat)** | | | | - Rouge1 | 45.5989 | 45.4362 (99.64%) | | - Rouge2 | 23.3526 | 23.168 (99.21%) | | - RougeL | 30.4608 | 30.2922 (99.45%) | | **MBXP (Code)** | 60.16 | 60.08 (99.87%) | # License Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.