jina-code-embeddings Collection high quality code embeddings trained from code generation models • 5 items • Updated 6 days ago • 11
Zaraah | Static Arabic Embedding Models Collection This blog post introduces the Zaraah family of static embedding models, designed for Arabic language tasks and built using the model2vec distillation • 10 items • Updated Jun 16 • 2
Audio Codecs Embeddings 🎙️ Collection A collection of codec and embedding models supported in 🤗 Transformers. • 5 items • Updated Jul 17, 2024 • 4
Chameleon Collection Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. • 2 items • Updated Jul 9, 2024 • 32
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10 • 209
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 7 days ago • 56
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 97
Multimodal DSE Retrievers Collection A collection of DSE models for multimodal retrieval • 5 items • Updated Apr 15 • 15
PubMedBERT Embeddings M2V Collection Models distilled with Model2Vec - 100K / 500K / 1M / 2M / 8M parameter variants. • 5 items • Updated Jan 26 • 4
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated Apr 10 • 100
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 By tomaarsen • Mar 26 • 161
view article Article ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval By manu and 2 others • Mar 18 • 11
📚 LLM pretraining datasets Collection A collection of datasets for LLM pretraining • 9 items • Updated May 5 • 11
Dar Datasets Collection datasets uploaded by https://github.com/ARBML/dar • 200 items • Updated Aug 22, 2024 • 14