hanxiao commited on
Commit
f44eaff
·
verified ·
1 Parent(s): d42894d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -118,28 +118,28 @@ Here's the speed and quality evaluation on two nano benchmarks. The higher the b
118
 
119
  #### Table 1: Tokens per Second on NanoHotpotQA `Documents`
120
 
121
- | Quantization | File Size | BPW | Peak VRAM | Token/s w/ FA | Token/s w/o FA |
122
  |------------------|-----------|-----|-----------|--------------|----------------|
123
- | IQ1_S | 748.77 MiB | 2.04 | 4137MB | 3625 | 2050 |
124
- | IQ1_M | 804.97 MiB | 2.19 | 4193MB | 3349 | 1997 |
125
- | IQ2_XXS | 898.64 MiB | 2.44 | 4287MB | 3701 | 2071 |
126
- | IQ2_M | 1.06 GiB | 2.94 | 4471MB | 3407 | 1989 |
127
- | Q2_K | 1.18 GiB | 3.29 | 4599MB | 3173 | 1905 |
128
- | IQ3_XXS | 1.19 GiB | 3.31 | 4605MB | 3668 | 2067 |
129
- | IQ3_XS | 1.29 GiB | 3.59 | 4709MB | 3604 | 2053 |
130
- | IQ3_S | 1.35 GiB | 3.76 | 4771MB | 3599 | 2049 |
131
- | IQ3_M | 1.38 GiB | 3.84 | 4803MB | 3603 | 2053 |
132
- | Q3_K_M | 1.48 GiB | 4.11 | 4899MB | 3450 | 2008 |
133
- | IQ4_NL | 1.69 GiB | 4.72 | 5123MB | 3571 | 2039 |
134
- | IQ4_XS | 1.61 GiB | 4.49 | 5041MB | 3585 | 2046 |
135
- | Q4_K_M | 1.79 GiB | 4.99 | 5223MB | 3558 | 2045 |
136
- | Q5_K_S | 2.02 GiB | 5.61 | 5451MB | 3567 | 2044 |
137
- | Q5_K_M | 2.07 GiB | 5.75 | 5505MB | 3528 | 2034 |
138
- | Q6_K | 2.36 GiB | 6.56 | 5801MB | 3334 | 1981 |
139
- | Q8_0 | 3.05 GiB | 8.50 | 6513MB | 3767 | 2101 |
140
- | F16 | 5.75 GiB | 16.00 | 9929MB | 3399 | 2023 |
141
- | v3 (Transformers) | 1.10 GiB | 16.00 | 2887MB | | 16505 |
142
- | v4 (Transformers) | 7.40 GiB | 16.00 | 14795MB | | 1865 |
143
 
144
 
145
  System info:
 
118
 
119
  #### Table 1: Tokens per Second on NanoHotpotQA `Documents`
120
 
121
+ | Quantization | BPW | File Size (GB) | Peak VRAM (GB) | Token/s w FA | Token/s w/o FA |
122
  |------------------|-----------|-----|-----------|--------------|----------------|
123
+ | IQ1_S | 2.04 | 0.73 | 4.04 | 3625 | 2050 |
124
+ | IQ1_M | 2.19 | 0.79 | 4.09 | 3349 | 1997 |
125
+ | IQ2_XXS | 2.44 | 0.88 | 4.19 | 3701 | 2071 |
126
+ | IQ2_M | 2.94 | 1.06 | 4.37 | 3407 | 1989 |
127
+ | Q2_K | 3.29 | 1.18 | 4.49 | 3173 | 1905 |
128
+ | IQ3_XXS | 3.31 | 1.19 | 4.50 | 3668 | 2067 |
129
+ | IQ3_XS | 3.59 | 1.29 | 4.60 | 3604 | 2053 |
130
+ | IQ3_S | 3.76 | 1.35 | 4.66 | 3599 | 2049 |
131
+ | IQ3_M | 3.84 | 1.38 | 4.69 | 3603 | 2053 |
132
+ | Q3_K_M | 4.11 | 1.48 | 4.78 | 3450 | 2008 |
133
+ | IQ4_NL | 4.72 | 1.69 | 5.00 | 3571 | 2039 |
134
+ | IQ4_XS | 4.49 | 1.61 | 4.92 | 3585 | 2046 |
135
+ | Q4_K_M | 4.99 | 1.79 | 5.10 | 3558 | 2045 |
136
+ | Q5_K_S | 5.61 | 2.02 | 5.32 | 3567 | 2044 |
137
+ | Q5_K_M | 5.75 | 2.07 | 5.38 | 3528 | 2034 |
138
+ | Q6_K | 6.56 | 2.36 | 5.66 | 3334 | 1981 |
139
+ | Q8_0 | 8.50 | 3.05 | 6.36 | 3767 | 2101 |
140
+ | F16 | 16.00 | 5.75 | 9.70 | 3399 | 2023 |
141
+ | v3 (Transformers) | 16.00 | 1.10 | 2.82 | | 16505 |
142
+ | v4 (Transformers) | 16.00 | 7.40 | 14.45 | | 1865 |
143
 
144
 
145
  System info: