Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -111,4 +111,11 @@ Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't b
|
|
111 |
|
112 |
### Quantizations
|
113 |
|
114 |
-
We use [`llama-quantize`](./quantize.sh) with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibration_data_v5_rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibration_data_v5_rc.txt` can be found [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/) and is recommended by Unsloth docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
### Quantizations
|
113 |
|
114 |
+
We use [`llama-quantize`](./quantize.sh) with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibration_data_v5_rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibration_data_v5_rc.txt` can be found [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/) and is recommended by Unsloth docs.
|
115 |
+
|
116 |
+
|
117 |
+
Here's the speed and quality evaluation on two nano benchmarks. The higher the better.
|
118 |
+
|
119 |
+

|
120 |
+

|
121 |
+

|