Quark Team FP8 Mixtral-8x7B Model Overview
Model Information For MLPerf
- Model Name: Mixtral-7x8b
- Version: MLPerf v5.1
- Commit: Close Division Commit
- Supported Hardware Microarchitecture: AMD MI300/MI325
- ROCm: 6.4.1
- Operating System(s): Linux
- Transformers: 4.46.3
- Quark: 0.9
Calibration Dataset
This model was built with mistralai Mixtral model by applying AMD-Quark for MXFP4 quantization. The calibration dataset consists of 1024 mixed datasets provided by mlcommons/inference, which includes:
- 325 GSM8k samples
- 325 MBXP samples
- 374 OpenOcra samples
Quantized Tensors
The following tensors are quantized in each decoder:
- Expert MLP Inputs and Weights (excluding the router)
- Linear qkv Inputs and Weight
- KV Cache Entries
Ignored Layers
The following layers are ignored during quantization:
*.gate
*.o_proj
lm_head
Algorithms
AutoSmoothQuant algorithm is applied in weight-activation quantization for better performance.
Quantization Scripts
cd examples/torch/language_modeling/llm_ptq/
MODEL_DIR="mistralai/Mixtral-8x7B-Instruct-v0.1"
DATASET="./mlperf_data/mixtral_8x7b%2F2024.06.06_mixtral_15k_calibration_v4.pkl"
OUTPUT_DIR="amd/Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3"
python3 quantize_quark.py --model_dir "${MODEL}" \
--output_dir "${OUTPUT_DIR}" \
--dataset "${DATASET}" \
--data_type float16 \
--multi_gpu \
--quant_scheme w_fp8_a_fp8 \
--kv_cache_dtype fp8 \
--num_calib_data 1024 \
--seq_len 1024 \
--min_kv_scale 1.0 \
--model_export hf_format \
--custom_mode fp8 \
--quant_algo autosmoothquant \
--exclude_layers "lm_head" "*.gate" "*.o_proj"
Model Performance Comparison
Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) |
---|---|---|
GSM8K (Math) | 73.66 | 73.18 (99.34%) |
Open Orca (Chat) | ||
- Rouge1 | 45.5989 | 45.4362 (99.64%) |
- Rouge2 | 23.3526 | 23.168 (99.21%) |
- RougeL | 30.4608 | 30.2922 (99.45%) |
MBXP (Code) | 60.16 | 60.08 (99.87%) |
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for amd/Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3
Base model
mistralai/Mixtral-8x7B-v0.1
Finetuned
mistralai/Mixtral-8x7B-Instruct-v0.1