mcronomus commited on
Commit
ab07af6
·
verified ·
1 Parent(s): cc881d0

Update README.md

Browse files

As correctly questioned in [community section](https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8/discussions/2), your recipe contradicts with quantized model parameters in `recipe.yaml`. I guess the correct version of the recipe is the one that is demonstrated in [llm-compressor github](https://github.com/vllm-project/llm-compressor?tab=readme-ov-file#apply-quantization).

The proposed change is made accordingly.

Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -107,12 +107,10 @@ ds = load_dataset("neuralmagic/LLM_compression_calibration", split="train")
107
  ds = ds.shuffle().select(range(num_samples))
108
  ds = ds.map(preprocess_fn)
109
 
110
- recipe = GPTQModifier(
111
- targets="Linear",
112
- scheme="W8A8",
113
- ignore=["lm_head"],
114
- dampening_frac=0.1,
115
- )
116
 
117
  model = SparseAutoModelForCausalLM.from_pretrained(
118
  model_id,
 
107
  ds = ds.shuffle().select(range(num_samples))
108
  ds = ds.map(preprocess_fn)
109
 
110
+ recipe = [
111
+ SmoothQuantModifier(smoothing_strength=0.7),
112
+ GPTQModifier(scheme="W8A8", targets="Linear", ignore=["lm_head"]),
113
+ ]
 
 
114
 
115
  model = SparseAutoModelForCausalLM.from_pretrained(
116
  model_id,