vdmbrsv commited on
Commit
28e6b7d
·
verified ·
1 Parent(s): b697865

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -100
README.md CHANGED
@@ -7,135 +7,96 @@ base_model: sentence-transformers/all-MiniLM-L6-v2
7
  pipeline_tag: sentence-similarity
8
  library_name: sentence-transformers
9
  ---
 
10
 
11
- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
12
 
13
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
14
 
15
  ## Model Details
16
 
17
- ### Model Description
18
  - **Model Type:** Sentence Transformer
19
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
20
- - **Maximum Sequence Length:** 512 tokens
21
- - **Output Dimensionality:** 384 dimensions
22
  - **Similarity Function:** Cosine Similarity
23
- <!-- - **Training Dataset:** Unknown -->
24
- <!-- - **Language:** Unknown -->
25
- <!-- - **License:** Unknown -->
26
-
27
- ### Model Sources
28
-
29
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
30
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
31
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
32
-
33
- ### Full Model Architecture
34
-
35
- ```
36
- SentenceTransformer(
37
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
38
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
39
- )
40
- ```
41
 
42
  ## Usage
43
 
44
- ### Direct Usage (Sentence Transformers)
45
-
46
- First install the Sentence Transformers library:
47
 
48
- ```bash
49
- pip install -U sentence-transformers
50
- ```
51
 
52
- Then you can load this model and run inference.
53
  ```python
54
- from sentence_transformers import SentenceTransformer
 
 
55
 
56
- # Download from the 🤗 Hub
57
  model = SentenceTransformer("tabularisai/all-MiniLM-L2-v2")
58
- # Run inference
59
- sentences = [
60
- 'The weather is lovely today.',
61
- "It's so sunny outside!",
62
- 'He drove to the stadium.',
63
- ]
64
- embeddings = model.encode(sentences)
65
- print(embeddings.shape)
66
- # [3, 384]
67
-
68
- # Get the similarity scores for the embeddings
69
- similarities = model.similarity(embeddings, embeddings)
70
- print(similarities.shape)
71
- # [3, 3]
72
- ```
73
-
74
- <!--
75
- ### Direct Usage (Transformers)
76
 
77
- <details><summary>Click to see the direct usage in Transformers</summary>
78
-
79
- </details>
80
- -->
81
-
82
- <!--
83
- ### Downstream Usage (Sentence Transformers)
84
-
85
- You can finetune this model on your own dataset.
86
-
87
- <details><summary>Click to expand</summary>
88
-
89
- </details>
90
- -->
91
 
92
- <!--
93
- ### Out-of-Scope Use
94
 
95
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
96
- -->
 
 
97
 
98
- <!--
99
- ## Bias, Risks and Limitations
 
100
 
101
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
102
- -->
103
 
104
- <!--
105
- ### Recommendations
 
 
 
106
 
107
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
108
- -->
109
 
110
- ## Training Details
111
 
112
- ### Framework Versions
113
- - Python: 3.10.10
114
- - Sentence Transformers: 4.1.0
115
- - Transformers: 4.51.3
116
- - PyTorch: 2.7.0+cu128
117
- - Accelerate:
118
- - Datasets: 3.5.1
119
- - Tokenizers: 0.21.1
120
 
121
- ## Citation
 
 
122
 
123
- ### BibTeX
124
 
125
- <!--
126
- ## Glossary
127
 
128
- *Clearly define terms in order to be accessible across audiences.*
129
- -->
130
 
131
- <!--
132
- ## Model Card Authors
 
 
 
133
 
134
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
135
- -->
136
 
137
- <!--
138
- ## Model Card Contact
 
139
 
140
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
141
- -->
 
 
 
 
7
  pipeline_tag: sentence-similarity
8
  library_name: sentence-transformers
9
  ---
10
+ # SentenceTransformer: tabularisai/all-MiniLM-L2-v2
11
 
 
12
 
13
+ This model was [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2), achieving **nearly faster inference** while retaining high accuracy.
14
 
15
  ## Model Details
16
 
17
+ ### Description
18
  - **Model Type:** Sentence Transformer
19
+ - **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
20
+ - **Maximum Sequence Length:** 256 tokens
21
+ - **Output Dimensionality:** 384
22
  - **Similarity Function:** Cosine Similarity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Usage
25
 
26
+ ### Retrieval-Augmented Generation (RAG) Example
 
 
27
 
28
+ Use this model as a retriever in a RAG pipeline:
 
 
29
 
 
30
  ```python
31
+ from sentence_transformers import SentenceTransformer, util
32
+ import faiss
33
+ import numpy as np
34
 
35
+ # Load embedding model
36
  model = SentenceTransformer("tabularisai/all-MiniLM-L2-v2")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ # Your 5 simple documents
39
+ documents = [
40
+ "Renewable energy comes from natural sources.",
41
+ "Solar panels convert sunlight into electricity.",
42
+ "Wind turbines harness wind power.",
43
+ "Fossil fuels are non-renewable sources of energy.",
44
+ "Hydropower uses water to generate electricity."
45
+ ]
 
 
 
 
 
 
46
 
47
+ # Embed documents
48
+ doc_embeddings = model.encode(documents, convert_to_numpy=True)
49
 
50
+ # Create FAISS index
51
+ dim = doc_embeddings.shape[1]
52
+ index = faiss.IndexFlatL2(dim)
53
+ index.add(doc_embeddings)
54
 
55
+ # Query
56
+ query = "What are the benefits of renewable energy?"
57
+ query_embedding = model.encode([query], convert_to_numpy=True)
58
 
59
+ # Search top 3 similar docs
60
+ D, I = index.search(query_embedding, k=3)
61
 
62
+ # Print results
63
+ print("Query:", query)
64
+ print("\nTop 3 similar documents:")
65
+ for rank, idx in enumerate(I[0]):
66
+ print(f"{rank+1}. {documents[idx]} (score: {D[0][rank]:.4f})")
67
 
68
+ ```
 
69
 
70
+ ### Sentence Embedding Example
71
 
72
+ Install the library:
 
 
 
 
 
 
 
73
 
74
+ ```bash
75
+ pip install -U sentence-transformers
76
+ ```
77
 
78
+ Load the model and encode sentences:
79
 
80
+ ```python
81
+ from sentence_transformers import SentenceTransformer
82
 
83
+ model = SentenceTransformer("tabularisai/all-MiniLM-L2-v2")
 
84
 
85
+ sentences = [
86
+ "The weather is lovely today.",
87
+ "It's so sunny outside!",
88
+ "He drove to the stadium.",
89
+ ]
90
 
91
+ embeddings = model.encode(sentences)
92
+ print(embeddings.shape) # [3, 384]
93
 
94
+ similarities = model.similarity(embeddings, embeddings)
95
+ print(similarities.shape) # [3, 3]
96
+ ```
97
 
98
+ ## Key Takeaways
99
+ - Based on MiniLM-L6-v2, distilled from a 12-layer model
100
+ - 2× faster inference
101
+ - 384-dimensional output
102
+ - Compatible with SentenceTransformers and Hugging Face RAG