dev-bjoern commited on
Commit
00bdc52
·
verified ·
1 Parent(s): 4f0eba9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -1
README.md CHANGED
@@ -41,4 +41,121 @@ This is an INT4 quantized version of [SmolLM3-3B](https://huggingface.co/Hugging
41
  ### Quantization Process
42
  ```python
43
  # Quantized using OpenVINO NNCF
44
- # INT4 symmetric quantization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ### Quantization Process
42
  ```python
43
  # Quantized using OpenVINO NNCF
44
+ # INT4 symmetric quantization
45
+ # Calibration dataset: [specify if used]
46
+ ```
47
+
48
+ ### Model Architecture
49
+ - Same architecture as SmolLM3-3B
50
+ - GQA and NoPE preserved
51
+ - 64k context support (128k with YARN)
52
+ - Multilingual capabilities maintained
53
+
54
+ ## 📊 Performance (Experimental)
55
+
56
+ > ⚠️ **Note:** This is an experimental quantization. Formal benchmarks pending.
57
+
58
+ Expected characteristics:
59
+ - **Model Size:** ~1GB (vs ~6GB fp16)
60
+ - **Inference Speed:** 2-4x faster on CPU
61
+ - **Quality Trade-off:** Minor degradation expected
62
+
63
+ ## 🛠️ How to Use
64
+
65
+ ### Installation
66
+ ```bash
67
+ pip install optimum[openvino] transformers
68
+ ```
69
+
70
+ ### Basic Usage
71
+ ```python
72
+ from optimum.intel import OVModelForCausalLM
73
+ from transformers import AutoTokenizer
74
+
75
+ model_id = "dev-bjoern/smollm3-int4-ov"
76
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
77
+ model = OVModelForCausalLM.from_pretrained(model_id)
78
+
79
+ # Generate text
80
+ prompt = "Explain quantum computing in simple terms"
81
+ inputs = tokenizer(prompt, return_tensors="pt")
82
+ outputs = model.generate(**inputs, max_new_tokens=100)
83
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
84
+ ```
85
+
86
+ ### With Extended Thinking
87
+ ```python
88
+ messages = [
89
+ {"role": "system", "content": "/think"},
90
+ {"role": "user", "content": "Solve this step by step: 25 * 16"}
91
+ ]
92
+
93
+ text = tokenizer.apply_chat_template(
94
+ messages,
95
+ tokenize=False,
96
+ add_generation_prompt=True
97
+ )
98
+ ```
99
+
100
+ ## 🎯 Intended Use
101
+
102
+ - **Edge AI applications**
103
+ - **Local LLM deployment**
104
+ - **Resource-constrained environments**
105
+ - **Privacy-focused applications**
106
+ - **Offline AI assistants**
107
+
108
+ ## ⚡ Optimization Tips
109
+
110
+ 1. **CPU Inference:** Use OpenVINO runtime for best performance
111
+ 2. **Batch Processing:** Leverage dynamic batching when possible
112
+ 3. **Memory:** Requires ~2GB RAM for comfortable operation
113
+
114
+ ## 🧪 Experimental Status
115
+
116
+ This is my first experiment with OpenVINO INT4 quantization. Feedback and contributions are welcome!
117
+
118
+ ### Known Limitations
119
+ - No formal benchmarks yet
120
+ - Quantization settings not fully optimized
121
+ - Some quality degradation vs full precision
122
+
123
+ ### Future Improvements
124
+ - [ ] Comprehensive benchmarking
125
+ - [ ] Mixed precision experiments
126
+ - [ ] Model compression analysis
127
+ - [ ] Calibration dataset optimization
128
+
129
+ ## 🤝 Contributing
130
+
131
+ Found issues or have suggestions? Please open a discussion or issue!
132
+
133
+ ## 📚 Resources
134
+
135
+ - [Original SmolLM3 Model](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
136
+ - [OpenVINO Documentation](https://docs.openvino.ai/)
137
+ - [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
138
+
139
+ ## 🙏 Acknowledgments
140
+
141
+ - HuggingFace team for SmolLM3
142
+ - Intel OpenVINO team for quantization tools
143
+ - Community for feedback and support
144
+
145
+ ## 📝 Citation
146
+
147
+ If you use this model, please cite both the original and this work:
148
+
149
+ ```bibtex
150
+ @misc{smollm3-int4-ov,
151
+ author = {Bjoern Bethge},
152
+ title = {SmolLM3 INT4 OpenVINO},
153
+ year = {2024},
154
+ publisher = {Hugging Face},
155
+ howpublished = {\url{https://huggingface.co/dev-bjoern/smollm3-int4-ov}}
156
+ }
157
+ ```
158
+
159
+ ---
160
+
161
+ **Status:** 🧪 Experimental | **Feedback:** Welcome | **License:** Apache 2.0