Create README.md

Browse files

Files changed (1) hide show

README.md +180 -0

README.md ADDED Viewed

	@@ -0,0 +1,180 @@

+---
+library_name: vllm
+language:
+- ar
+- de
+- en
+- es
+- fr
+- hi
+- id
+- it
+- pt
+- th
+- tl
+- vi
+base_model:
+- meta-llama/Llama-4-Scout-17B-16E-Instruct
+pipeline_tag: image-text-to-text
+tags:
+- facebook
+- meta
+- pytorch
+- llama
+- llama4
+- neuralmagic
+- redhat
+- llmcompressor
+- quantized
+- W4A16
+- INT4
+license: other
+license_name: llama4
+---
+# Llama-4-Scout-17B-16E-Instruct-quantized.w4a16
+## Model Overview
+- **Model Architecture:** Llama4ForConditionalGeneration
+  - **Input:** Text / Image
+  - **Output:** Text
+- **Model Optimizations:**
+  - **Activation quantization:** None
+  - **Weight quantization:** INT4
+- **Release Date:** 04/25/2025
+- **Version:** 1.0
+- **Model Developers:** Red Hat (Neural Magic)
+### Model Optimizations
+This model was obtained by quantizing weights of [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) to INT4 data type.
+This optimization reduces the number of bits used to represent weights from 16 to 4, reducing GPU memory requirements by approximately 75%.
+Weight quantization also reduces disk size requirements by approximately 75%. The [llm-compressor](https://github.com/vllm-project/llm-compressor) library is used for quantization.
+## Deployment
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
+```python
+from vllm import LLM, SamplingParams
+from transformers import AutoTokenizer
+model_id = "RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16"
+number_gpus = 4
+sampling_params = SamplingParams(temperature=0.7, top_p=0.8, max_tokens=256)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "Give me a short introduction to large language model."
+llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
+outputs = llm.generate(prompt, sampling_params)
+generated_text = outputs[0].outputs[0].text
+print(generated_text)
+```
+vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
+## Evaluation
+The model was evaluated on the OpenLLM leaderboard tasks (v1 and v2), long context RULER, multimodal MMMU, and multimodal ChartQA.
+All evaluations are obtained through [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
+<details>
+  <summary>Evaluation details</summary>
+  **OpenLLM v1**
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
+    --tasks openllm \
+    --batch_size auto
+  ```
+  **OpenLLM v2**
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16",dtype=auto,add_bos_token=False,max_model_len=16384,tensor_parallel_size=8,gpu_memory_utilization=0.5,enable_chunked_prefill=True,trust_remote_code=True \
+    --tasks leaderboard \
+    --apply_chat_template \
+    --fewshot_as_multiturn \
+    --batch_size auto
+  ```
+  **Long Context RULER**
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16",dtype=auto,add_bos_token=False,max_model_len=524288,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True \
+    --tasks ruler \
+    --metadata='{"max_seq_lengths":[131072]}' \
+    --batch_size auto
+  ```
+  **Multimodal MMMU**
+  ```
+  lm_eval \
+    --model vllm-vlm \
+    --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16",dtype=auto,add_bos_token=False,max_model_len=1000000,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True,max_images=10 \
+    --tasks mmmu_val \
+    --apply_chat_template \
+    --batch_size auto
+  ```
+  **Multimodal ChartQA**
+  ```
+  export VLLM_MM_INPUT_CACHE_GIB=8
+  lm_eval \
+    --model vllm-vlm \
+    --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16",dtype=auto,add_bos_token=False,max_model_len=1000000,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True,max_images=10 \
+    --tasks chartqa \
+    --apply_chat_template \
+    --batch_size auto
+  ```
+</details>
+### Accuracy
+|                                                | Recovery (%) | meta-llama/Llama-4-Scout-17B-16E-Instruct | RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16<br>(this model) |
+| ---------------------------------------------- | :-----------: | :---------------------------------------: | :-----------------------------------------------------------------: |
+| ARC-Challenge<br>25-shot                       | ?       | 69.37                                     | ?                                                               |
+| GSM8k<br>5-shot                                | ?        | 90.45                                     | ?                                                               |
+| HellaSwag<br>10-shot                           | ?        | 85.23                                     | ?                                                               |
+| MMLU<br>5-shot                                 | ?        | 80.54                                     | ?                                                               |
+| TruthfulQA<br>0-shot                           | ?        | 61.41                                     | ?                                                               |
+| WinoGrande<br>5-shot                           | ?        | 77.90                                     | ?                                                               |
+| **OpenLLM v1<br>Average Score**                    | **?**        | **77.48**                                     | **?**                                                               |
+| IFEval<br>0-shot<br>avg of inst and prompt acc | ?       | 86.90                                     | ?                                                               |
+| Big Bench Hard<br>3-shot                       | ?        | 65.13                                     | ?                                                               |
+| Math Lvl 5<br>4-shot                           | ?        | 57.78                                     | ?                                                               |
+| GPQA<br>0-shot                                 | ?       | 31.88                                     | ?                                                               |
+| MuSR<br>0-shot                                 | ?       | 42.20                                     | ?                                                               |
+| MMLU-Pro<br>5-shot                             | ?        | 55.70                                     | ?                                                               |
+| **OpenLLM v2<br>Average Score**                    | **?**       | **56.60**                                     | **?**                                                               |
+| RULER<br>seqlen = 131072<br>niah_multikey_1    | ?       | 88.20                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_multikey_2    | ?       | 83.60                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_multikey_3    | ?        | 78.80                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_multiquery    | ?       | 95.40                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_multivalue    | ?        | 73.75                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_single_1      | ?       | 100.00                                    | ?                                                              |
+| RULER<br>seqlen = 131072<br>niah_single_2      | ?       | 99.80                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>niah_single_3      | ?       | 99.80                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>ruler_cwe          | ?        | 39.42                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>ruler_fwe          | ?        | 92.93                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>ruler_qa_hotpot    | ?       | 48.20                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>ruler_qa_squad     | ?        | 53.57                                     | ?                                                               |
+| RULER<br>seqlen = 131072<br>ruler_qa_vt        | ?       | 92.28                                     | ?                                                               |
+| **RULER<br>seqlen = 131072<br>Average Score**      | **?**        | **80.44**                                     | **?**                                                               |
+| MMMU<br>0-shot                                 | ?        | 53.44                                     | ?                                                               |
+| ChartQA<br>0-shot<br>exact_match               | ?       | 65.88                                     | ?                                                               |
+| ChartQA<br>0-shot<br>relaxed_accuracy          | ?        | 88.92                                     | ?                                                               |
+| **Multimodal Average Score**                       | **?**        | **69.41**                                     | **?**                                                               |