File size: 4,308 Bytes
8bc6c38 5cd7704 8bc6c38 5abf044 7eacd87 5abf044 8bc6c38 5abf044 8bc6c38 732bd9b 5abf044 8bc6c38 5abf044 8bc6c38 5abf044 8bc6c38 5abf044 8bc6c38 6f99c6d 5abf044 6f99c6d 5abf044 6f99c6d 09a6906 5abf044 09a6906 6f99c6d 5abf044 6f99c6d 5abf044 6f99c6d 5abf044 6f99c6d 5abf044 8bc6c38 5abf044 8bc6c38 5abf044 8bc6c38 5abf044 8bc6c38 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- mteb
- arctic
- embedding
- snowflake2_m_uint8
- snowflake
- transformers.js
license: apache-2.0
language:
- af
- ar
- az
- be
- bg
- bn
- ca
- ceb
- cs
- cy
- da
- de
- el
- en
- es
- et
- eu
- fa
- fi
- fr
- gl
- gu
- he
- hi
- hr
- ht
- hu
- hy
- id
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ky
- lo
- lt
- lv
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- pa
- pl
- pt
- qu
- ro
- ru
- si
- sk
- sl
- so
- sq
- sr
- sv
- sw
- ta
- te
- th
- tl
- tr
- uk
- ur
- vi
- yo
- zh
---
# Update
I've updated this model to be compatible with Fastembed.
I removed the `sentence_embedding` output and quantized the main model output instead. This now outputs a dimension 768 multivector.
To use the output you should use CLS pooling with normalization disabled.
# snowflake2_m_uint8
This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
I have added a linear quantization node before the `token_embeddings` output so that it directly outputs a dimension 768 uint8 multivector.
This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
I took the liberty of removing the `sentence_embedding` output (since I would've had to re-create it), I can add it back in if anybody wants it.
# Quantization method
Linear quantization for the scale -7 to 7.
Here's what the graph of the original output looks like:

Here's what the new graph in this model looks like:

# Benchmark
I used beir-qdrant with the scifact dataset.
quantized output (this model):
```
ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.64619, 'NDCG@5': 0.6687, 'NDCG@10': 0.69228, 'NDCG@100': 0.72204, 'NDCG@1000': 0.72747}
recall: {'Recall@1': 0.56094, 'Recall@3': 0.68394, 'Recall@5': 0.73983, 'Recall@10': 0.80689, 'Recall@100': 0.94833, 'Recall@1000': 0.99333}
precision: {'P@1': 0.59333, 'P@3': 0.25, 'P@5': 0.16467, 'P@10': 0.09167, 'P@100': 0.01077, 'P@1000': 0.00112}
```
unquantized output (model_uint8.onnx):
```
ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.65417, 'NDCG@5': 0.6741, 'NDCG@10': 0.69675, 'NDCG@100': 0.7242, 'NDCG@1000': 0.7305}
recall: {'Recall@1': 0.56094, 'Recall@3': 0.69728, 'Recall@5': 0.74817, 'Recall@10': 0.81356, 'Recall@100': 0.945, 'Recall@1000': 0.99667}
precision: {'P@1': 0.59333, 'P@3': 0.25444, 'P@5': 0.16667, 'P@10': 0.09233, 'P@100': 0.01073, 'P@1000': 0.00113}
```
# Example inference/benchmark code and how to use the model with Fastembed
After installing beir-qdrant make sure to upgrade fastembed.
```python
# pip install qdrant_client beir-qdrant
# pip install -U fastembed
from fastembed import TextEmbedding
from fastembed.common.model_description import PoolingType, ModelSource
from beir import util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from qdrant_client import QdrantClient
from qdrant_client.models import Datatype
from beir_qdrant.retrieval.models.fastembed import DenseFastEmbedModelAdapter
from beir_qdrant.retrieval.search.dense import DenseQdrantSearch
TextEmbedding.add_custom_model(
model="electroglyph/snowflake2_m_uint8",
pooling=PoolingType.CLS,
normalization=False,
sources=ModelSource(hf="electroglyph/snowflake2_m_uint8"),
dim=768,
model_file="snowflake2_m_uint8.onnx",
)
dataset = "scifact"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
data_path = util.download_and_unzip(url, "datasets")
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")
qdrant_client = QdrantClient("http://localhost:6333")
model = DenseQdrantSearch(
qdrant_client,
model=DenseFastEmbedModelAdapter(
model_name="electroglyph/snowflake2_m_uint8"
),
collection_name="scifact-uint8",
initialize=True,
datatype=Datatype.UINT8
)
retriever = EvaluateRetrieval(model)
results = retriever.retrieve(corpus, queries)
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
print(f"ndcg: {ndcg}\nrecall: {recall}\nprecision: {precision}")
``` |