Model Overview
- Model_Architecture: DeepSeek V3
- Input: Text
- Output: Text
- Supported_Hardware_Microarchitecture: AMD MI350/MI355
- ROCm: "7.0"
- Operating Systems: Linux
- Inference Engine: vLLM
- Model Optimizer: AMD-Quark
- Quantization:
- Weight:
- Type: OCP MXFP4
- Mode: Static
- Activation:
- Type: OCP MXFP4
- Mode: Dynamic
- KV_Cache:
- Type: OCP FP8
- Mode: Static
- Weight:
- Calibration_Dataset: Pile
This model was built with DeepSeek by applying AMD-Quark for MXFP4 quantization.
Model Quantization
The model was quantized from unsloth/DeepSeek-V3-0324-BF16 using AMD-Quark.
Weights and activations were quantized to MXFP4, and KV caches were quantized to FP8.
The AutoSmoothQuant algorithm was applied to enhance accuracy during quantization.
Quantization Scripts
cd Quark/examples/torch/language_modeling/llm_ptq/
python3 quantize_quark.py \
--model_dir "/deepseek-ai/DeepSeek-V3-0324-BF16/" \
--quant_scheme "w_mxfp4_a_mxfp4" \
--quant_algo_config_file "llm_ptq/models/deepseekv2v3/autosmoothquant_config.json" \
--num_calib_data 128 \
--exclude_layers "$exclude_layers"\
--multi_gpu true \
--quant_algo "autosmoothquant" \
--model_export "hf_format" \
--output_dir "$output_dir"
Deployment
- Backend: vLLM
- Description: This model can be deployed efficiently using the vLLM backend.
Evaluation
- Tasks:
- Wikitext
- GSM8K
- Framework: lm-evaluation-harness
- Engine: vLLM
Accuracy
- wikitext-ppl
3.33074593544006
Reproduction Command
Wikitext:
lm_eval \
--model vllm \
--model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
--tasks wikitext \
--fewshot_as_multiturn \
--apply_chat_template \
--num_fewshot 5 \
--batch_size auto
GSM8K:
lm_eval \
--model vllm \
--model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
--tasks gsm8k_llama \
--fewshot_as_multiturn \
--apply_chat_template \
--num_fewshot 8 \
--batch_size auto
License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
- Downloads last month
- 56
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ
Base model
deepseek-ai/DeepSeek-V3