Update README.md
Browse files
README.md
CHANGED
|
@@ -9,10 +9,15 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
|
|
| 9 |
|
| 10 |
## Available Quantizations
|
| 11 |
|
|
|
|
|
|
|
| 12 |
1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
|
| 13 |
-
2.
|
| 14 |
-
3.
|
| 15 |
-
4.
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Use Aria2 for parallelized downloads, links will download 9x faster
|
| 18 |
|
|
@@ -22,8 +27,7 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
|
|
| 22 |
>>
|
| 23 |
>>Feel free to paste these all in at once or one at a time
|
| 24 |
|
| 25 |
-
### Q4_0_48 (CPU Optimized
|
| 26 |
-

|
| 27 |
|
| 28 |
|
| 29 |
```bash
|
|
@@ -36,7 +40,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gg
|
|
| 36 |
```
|
| 37 |
|
| 38 |
|
| 39 |
-
### IQ4_XS Version - Fastest for CPU/GPU (Size: ~212 GB)
|
| 40 |
```bash
|
| 41 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
|
| 42 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
|
|
@@ -52,7 +56,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00002-of-00003.gguf https://huggingfa
|
|
| 52 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
|
| 53 |
```
|
| 54 |
|
| 55 |
-
|
| 56 |
### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
|
| 57 |
|
| 58 |
```verilog
|
|
@@ -70,6 +74,11 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00003-of-00004.gguf https:/
|
|
| 70 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
|
| 71 |
```
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
### BF16 Version
|
| 74 |
|
| 75 |
```bash
|
|
|
|
| 9 |
|
| 10 |
## Available Quantizations
|
| 11 |
|
| 12 |
+
Available Quantizations
|
| 13 |
+
|
| 14 |
1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
|
| 15 |
+
2. IQ4_XS (Fastest for CPU/GPU): ~212 GB
|
| 16 |
+
3. Q2K-Q8 Mixed quant with iMatrix: ~154 GB
|
| 17 |
+
4. Q2K-Q8 Mixed without iMat for testing: ~165 GB
|
| 18 |
+
5. 1-bit Custom per weight COHERENT quant: ~103 GB
|
| 19 |
+
6. BF16: ~811 GB (original model)
|
| 20 |
+
7. Q8_0: ~406 GB (original model)
|
| 21 |
|
| 22 |
## Use Aria2 for parallelized downloads, links will download 9x faster
|
| 23 |
|
|
|
|
| 27 |
>>
|
| 28 |
>>Feel free to paste these all in at once or one at a time
|
| 29 |
|
| 30 |
+
### Q4_0_48 (CPU FMA Optimized Specifically for ARM server chips, NOT TESTED on X86)
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
```bash
|
|
|
|
| 40 |
```
|
| 41 |
|
| 42 |
|
| 43 |
+
### IQ4_XS Version - Fastest for CPU/GPU should work everywhere (Size: ~212 GB)
|
| 44 |
```bash
|
| 45 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
|
| 46 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
|
|
|
|
| 56 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
|
| 57 |
```
|
| 58 |
|
| 59 |
+
|
| 60 |
### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
|
| 61 |
|
| 62 |
```verilog
|
|
|
|
| 74 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
|
| 75 |
```
|
| 76 |
|
| 77 |
+
<figure>
|
| 78 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/DD71wAB7DlQBmTG8wVaWS.png" alt="Q4_0_48 CPU Optimized example response">
|
| 79 |
+
<figcaption><strong>Q4_0_48 (CPU Optimized) (246GB):</strong> Example response of 20000 token prompt</figcaption>
|
| 80 |
+
</figure>
|
| 81 |
+
|
| 82 |
### BF16 Version
|
| 83 |
|
| 84 |
```bash
|