---
license: mit
base_model:
- deepseek-ai/DeepSeek-V3
---
# Model Overview
- **Model_Architecture:** DeepSeek V3
  - **Input:** Text
  - **Output:** Text
- **Supported_Hardware_Microarchitecture:** AMD MI350/MI355
- **ROCm:** "7.0"
- **Operating Systems:** Linux
- **Inference Engine:** vLLM
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
- **Quantization:**
  - **Weight:**
     - Type: OCP MXFP4
     - Mode: Static
  - **Activation:**
     - Type: OCP MXFP4
     - Mode: Dynamic
  - **KV_Cache:**
     - Type: OCP FP8
     - Mode: Static
- **Calibration_Dataset:** Pile

This model was built with DeepSeek by applying AMD-Quark for MXFP4 quantization.

# Model Quantization
    The model was quantized from unsloth/DeepSeek-V3-0324-BF16 using AMD-Quark.
    Weights and activations were quantized to MXFP4, and KV caches were quantized to FP8.
    The AutoSmoothQuant algorithm was applied to enhance accuracy during quantization.

**Quantization Scripts**
  ```console
  cd Quark/examples/torch/language_modeling/llm_ptq/
  python3 quantize_quark.py \
      --model_dir "/deepseek-ai/DeepSeek-V3-0324-BF16/" \
      --quant_scheme "w_mxfp4_a_mxfp4" \
      --quant_algo_config_file "llm_ptq/models/deepseekv2v3/autosmoothquant_config.json" \
      --num_calib_data 128 \
      --exclude_layers "$exclude_layers"\
      --multi_gpu true \
      --quant_algo "autosmoothquant" \
      --model_export "hf_format" \
      --output_dir "$output_dir"
  ```

# Deployment 
  - Backend: vLLM
  - Description: This model can be deployed efficiently using the vLLM backend.

# Evaluation
  - Tasks:
    - Wikitext
    - GSM8K
  - Framework: lm-evaluation-harness
  - Engine: vLLM

# Accuracy
  - wikitext-ppl <br>
    3.33074593544006

**Reproduction Command**
  ```console
  Wikitext:  
    lm_eval \
      --model vllm \
      --model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
      --tasks wikitext \
      --fewshot_as_multiturn \
      --apply_chat_template \
      --num_fewshot 5 \
      --batch_size auto

  GSM8K:
    lm_eval \
      --model vllm \
      --model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
      --tasks gsm8k_llama \
      --fewshot_as_multiturn \
      --apply_chat_template \
      --num_fewshot 8 \
      --batch_size auto
  ```

# License
    Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.