cmh commited on
Commit
8307923
·
verified ·
1 Parent(s): f873409

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
12
 
13
  [ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.](https://github.com/turboderp-org/exllamav2)
14
 
15
- | Filename | Quant type | File Size | Vram at 16k context|
16
  | -------- | ---------- | --------- | -------- |
17
  | [phi-4_hb8_3bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_3bpw) | 3.00 bits per weight | 6.66 GB | **10,3 GB** |
18
  | [phi-4_hb8_4bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_4bpw) | 4.00 bits per weight | 8.36 GB | **11,9 GB** |
@@ -21,6 +21,8 @@ pipeline_tag: text-generation
21
  | [phi-4_hb8_7bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_7bpw) | 7.00 bits per weight | 13.5 GB | **16,7 GB** |
22
  | [phi-4_hb8_8bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
23
 
 
 
24
  # Phi-4 Model Card
25
 
26
  [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
 
12
 
13
  [ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.](https://github.com/turboderp-org/exllamav2)
14
 
15
+ | Filename | Quant type | File Size | Vram*|
16
  | -------- | ---------- | --------- | -------- |
17
  | [phi-4_hb8_3bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_3bpw) | 3.00 bits per weight | 6.66 GB | **10,3 GB** |
18
  | [phi-4_hb8_4bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_4bpw) | 4.00 bits per weight | 8.36 GB | **11,9 GB** |
 
21
  | [phi-4_hb8_7bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_7bpw) | 7.00 bits per weight | 13.5 GB | **16,7 GB** |
22
  | [phi-4_hb8_8bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
23
 
24
+ * at 16k context
25
+
26
  # Phi-4 Model Card
27
 
28
  [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)