Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -118,28 +118,28 @@ Here's the speed and quality evaluation on two nano benchmarks. The higher the b
|
|
118 |
|
119 |
#### Table 1: Tokens per Second on NanoHotpotQA `Documents`
|
120 |
|
121 |
-
| Quantization | File Size
|
122 |
|------------------|-----------|-----|-----------|--------------|----------------|
|
123 |
-
| IQ1_S |
|
124 |
-
| IQ1_M |
|
125 |
-
| IQ2_XXS |
|
126 |
-
| IQ2_M |
|
127 |
-
| Q2_K |
|
128 |
-
| IQ3_XXS |
|
129 |
-
| IQ3_XS |
|
130 |
-
| IQ3_S |
|
131 |
-
| IQ3_M |
|
132 |
-
| Q3_K_M | 1.48
|
133 |
-
| IQ4_NL |
|
134 |
-
| IQ4_XS | 1.61
|
135 |
-
| Q4_K_M |
|
136 |
-
| Q5_K_S | 2.02
|
137 |
-
| Q5_K_M | 2.07
|
138 |
-
| Q6_K |
|
139 |
-
| Q8_0 |
|
140 |
-
| F16 |
|
141 |
-
| v3 (Transformers) |
|
142 |
-
| v4 (Transformers) |
|
143 |
|
144 |
|
145 |
System info:
|
|
|
118 |
|
119 |
#### Table 1: Tokens per Second on NanoHotpotQA `Documents`
|
120 |
|
121 |
+
| Quantization | BPW | File Size (GB) | Peak VRAM (GB) | Token/s w FA | Token/s w/o FA |
|
122 |
|------------------|-----------|-----|-----------|--------------|----------------|
|
123 |
+
| IQ1_S | 2.04 | 0.73 | 4.04 | 3625 | 2050 |
|
124 |
+
| IQ1_M | 2.19 | 0.79 | 4.09 | 3349 | 1997 |
|
125 |
+
| IQ2_XXS | 2.44 | 0.88 | 4.19 | 3701 | 2071 |
|
126 |
+
| IQ2_M | 2.94 | 1.06 | 4.37 | 3407 | 1989 |
|
127 |
+
| Q2_K | 3.29 | 1.18 | 4.49 | 3173 | 1905 |
|
128 |
+
| IQ3_XXS | 3.31 | 1.19 | 4.50 | 3668 | 2067 |
|
129 |
+
| IQ3_XS | 3.59 | 1.29 | 4.60 | 3604 | 2053 |
|
130 |
+
| IQ3_S | 3.76 | 1.35 | 4.66 | 3599 | 2049 |
|
131 |
+
| IQ3_M | 3.84 | 1.38 | 4.69 | 3603 | 2053 |
|
132 |
+
| Q3_K_M | 4.11 | 1.48 | 4.78 | 3450 | 2008 |
|
133 |
+
| IQ4_NL | 4.72 | 1.69 | 5.00 | 3571 | 2039 |
|
134 |
+
| IQ4_XS | 4.49 | 1.61 | 4.92 | 3585 | 2046 |
|
135 |
+
| Q4_K_M | 4.99 | 1.79 | 5.10 | 3558 | 2045 |
|
136 |
+
| Q5_K_S | 5.61 | 2.02 | 5.32 | 3567 | 2044 |
|
137 |
+
| Q5_K_M | 5.75 | 2.07 | 5.38 | 3528 | 2034 |
|
138 |
+
| Q6_K | 6.56 | 2.36 | 5.66 | 3334 | 1981 |
|
139 |
+
| Q8_0 | 8.50 | 3.05 | 6.36 | 3767 | 2101 |
|
140 |
+
| F16 | 16.00 | 5.75 | 9.70 | 3399 | 2023 |
|
141 |
+
| v3 (Transformers) | 16.00 | 1.10 | 2.82 | | 16505 |
|
142 |
+
| v4 (Transformers) | 16.00 | 7.40 | 14.45 | | 1865 |
|
143 |
|
144 |
|
145 |
System info:
|