hanxiao commited on
Commit
b94c8e4
·
verified ·
1 Parent(s): 14d2f50

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -111,4 +111,11 @@ Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't b
111
 
112
  ### Quantizations
113
 
114
- We use [`llama-quantize`](./quantize.sh) with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibration_data_v5_rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibration_data_v5_rc.txt` can be found [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/) and is recommended by Unsloth docs.
 
 
 
 
 
 
 
 
111
 
112
  ### Quantizations
113
 
114
+ We use [`llama-quantize`](./quantize.sh) with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibration_data_v5_rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibration_data_v5_rc.txt` can be found [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/) and is recommended by Unsloth docs.
115
+
116
+
117
+ Here's the speed and quality evaluation on two nano benchmarks. The higher the better.
118
+
119
+ ![](jina-embeddings-v4-text-retrieval-GGUF on L4.svg)
120
+ ![](NanoFiQA2018.svg)
121
+ ![](NanoHotpotQA.svg)