hanxiao commited on
Commit
649c74e
·
verified ·
1 Parent(s): 8864c0c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -15,7 +15,7 @@ A collection of GGUF and quantizations for [`jina-embeddings-v4`](https://huggin
15
 
16
  ## Text-Only Task-Specific Models
17
 
18
- Here, we removed the visual components of qwen2.5-vl and merged all LoRA adapters back into the base language model. This results in three task-specific v4 models with 3.09B parameters, downsized from the original jina-embeddings-v4 3.75B parameters:
19
 
20
  | HuggingFace Repo | Task |
21
  |---|---|
@@ -108,3 +108,7 @@ To some users, ⚠️ indicates a somewhat surprising behavior where `prompt_nam
108
  ### Matryoshka embeddings
109
 
110
  Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't break the Matryoshka feature. Let's say you get embeddings with shape `NxD` - you can simply use `embeddings[:, :truncate_dim]` to get smaller truncated embeddings. Note that not every dimension is trained though. For v4, you can set `truncate_dim` to any of these values: `[128, 256, 512, 1024, 2048]`.
 
 
 
 
 
15
 
16
  ## Text-Only Task-Specific Models
17
 
18
+ We removed the visual components of `qwen2.5-vl` and merged all LoRA adapters back into the base language model. This results in three task-specific v4 models with 3.09B parameters, downsized from the original jina-embeddings-v4 3.75B parameters:
19
 
20
  | HuggingFace Repo | Task |
21
  |---|---|
 
108
  ### Matryoshka embeddings
109
 
110
  Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't break the Matryoshka feature. Let's say you get embeddings with shape `NxD` - you can simply use `embeddings[:, :truncate_dim]` to get smaller truncated embeddings. Note that not every dimension is trained though. For v4, you can set `truncate_dim` to any of these values: `[128, 256, 512, 1024, 2048]`.
111
+
112
+ ### Quantizations
113
+
114
+ We use [`llama-quantize`](./quantize.sh) with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibration_data_v5_rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibration_data_v5_rc.txt` can be found [here](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/) and is recommended by Unsloth docs.