dev-bjoern commited on
Commit
29f8c2d
·
verified ·
1 Parent(s): 00bdc52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -8
README.md CHANGED
@@ -30,9 +30,9 @@ This is an INT4 quantized version of [SmolLM3-3B](https://huggingface.co/Hugging
30
 
31
  ## Model Overview
32
 
33
- - **Base Model:** SmolLM3-3B (3B parameters)
34
  - **Quantization:** INT4 via OpenVINO
35
- - **Size Reduction:** ~75% smaller than original
36
  - **Target Hardware:** CPUs, Intel GPUs, NPUs
37
  - **Use Cases:** Local inference, edge deployment, resource-constrained environments
38
 
@@ -55,10 +55,13 @@ This is an INT4 quantized version of [SmolLM3-3B](https://huggingface.co/Hugging
55
 
56
  > ⚠️ **Note:** This is an experimental quantization. Formal benchmarks pending.
57
 
58
- Expected characteristics:
59
- - **Model Size:** ~1GB (vs ~6GB fp16)
60
- - **Inference Speed:** 2-4x faster on CPU
61
- - **Quality Trade-off:** Minor degradation expected
 
 
 
62
 
63
  ## 🛠️ How to Use
64
 
@@ -108,8 +111,8 @@ text = tokenizer.apply_chat_template(
108
  ## ⚡ Optimization Tips
109
 
110
  1. **CPU Inference:** Use OpenVINO runtime for best performance
111
- 2. **Batch Processing:** Leverage dynamic batching when possible
112
- 3. **Memory:** Requires ~2GB RAM for comfortable operation
113
 
114
  ## 🧪 Experimental Status
115
 
 
30
 
31
  ## Model Overview
32
 
33
+ - **Base Model:** SmolLM3-3B
34
  - **Quantization:** INT4 via OpenVINO
35
+ - **Size Reduction:** Significant compression achieved
36
  - **Target Hardware:** CPUs, Intel GPUs, NPUs
37
  - **Use Cases:** Local inference, edge deployment, resource-constrained environments
38
 
 
55
 
56
  > ⚠️ **Note:** This is an experimental quantization. Formal benchmarks pending.
57
 
58
+ Expected benefits of INT4 quantization:
59
+ - Reduced model size
60
+ - Faster CPU inference
61
+ - Lower memory requirements
62
+ - Some quality trade-off
63
+
64
+ Actual metrics will be added after proper benchmarking.
65
 
66
  ## 🛠️ How to Use
67
 
 
111
  ## ⚡ Optimization Tips
112
 
113
  1. **CPU Inference:** Use OpenVINO runtime for best performance
114
+ 2. **Batch Processing:** Consider batching requests when possible
115
+ 3. **Memory:** INT4 significantly reduces memory requirements
116
 
117
  ## 🧪 Experimental Status
118