maxious commited on
Commit
0fdf9c8
·
verified ·
1 Parent(s): e587c3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -57
README.md CHANGED
@@ -1,57 +1,61 @@
1
- # Gemma3 Embedding Model: ONNX Conversion Demonstration
2
-
3
- This repository demonstrates the conversion and comparison of the Gemma3 embedding model from Hugging Face to ONNX format using optimum-onnx. It includes scripts for both ONNX and PyTorch inference pipelines, as well as a comparison of their outputs.
4
-
5
- ## Files
6
-
7
- - `onnx_gemma3_pipeline.py`: Runs the Gemma3 embedding model using ONNXRuntime, including post-processing steps (Pooling, Dense, Normalize) with ONNX exported layers.
8
- - `pytorch_gemma3_pipeline.py`: Runs the original Gemma3 embedding model using PyTorch and SentenceTransformer for reference.
9
- - `compare_gemma3_onnx_vs_pytorch.py`: Compares the output embeddings and cosine similarities between the ONNX and PyTorch pipelines.
10
- - `download_missing_hf_files.py`: Downloads required files from Hugging Face and exports Dense layers to ONNX.
11
- - `gemma3_mean_pooling_basic.py`: The most basic implementation, running Gemma3 ONNX inference with only mean pooling (no Dense or Normalize stages).
12
-
13
- ## Pipeline Differences
14
-
15
- Both pipelines use ONNXRuntime for transformer inference via `ORTModelForFeatureExtraction`. The key difference is in post-processing:
16
-
17
- - **ONNX pipeline** (`onnx_gemma3_pipeline.py`): Uses ONNXRuntime for both the transformer and Dense layers (exported to ONNX), making most of the pipeline ONNX-based except for normalization.
18
- - **PyTorch pipeline** (`pytorch_gemma3_pipeline.py`): Uses ONNXRuntime for the transformer, but all post-processing (Pooling, Dense, Normalize) is performed with PyTorch modules from SentenceTransformer.
19
-
20
- This demonstrates how ONNX conversion can offload more computation for faster, hardware-agnostic inference, while the PyTorch pipeline serves as the reference implementation.
21
-
22
- ## Setup
23
-
24
- 1. Install dependencies:
25
- ```sh
26
- pip install git+https://github.com/simondanielsson/optimum-onnx.git@feature/add-gemma3-export
27
- pip install git+https://github.com/huggingface/[email protected]
28
- pip install sentence-transformers onnxruntime safetensors huggingface_hub
29
- ```
30
- 2. Export the ONNX model:
31
- ```sh
32
- optimum-cli export onnx --model google/embeddinggemma-300m-qat-q4_0-unquantized embeddinggemma-300m-onnx
33
- python download_missing_hf_files.py
34
- ```
35
-
36
- ## Usage
37
-
38
- - Run the ONNX pipeline:
39
- ```sh
40
- python onnx_gemma3_pipeline.py
41
- ```
42
- - Run the PyTorch pipeline:
43
- ```sh
44
- python pytorch_gemma3_pipeline.py
45
- ```
46
- - Compare outputs:
47
- ```sh
48
- python compare_gemma3_onnx_vs_pytorch.py
49
- ```
50
-
51
- ## Results
52
-
53
- The comparison script prints cosine similarities between sample word embeddings (e.g., "apple", "banana", "car") for both ONNX and PyTorch pipelines, demonstrating the fidelity of the ONNX conversion.
54
-
55
- ## References
56
- - [Optimum-ONNX Gemma3 PR](https://github.com/huggingface/optimum-onnx/pull/50)
57
- - [Gemma3 Model](https://huggingface.co/google/embeddinggemma-300m-qat-q4_0-unquantized)
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - google/embeddinggemma-300m-qat-q4_0-unquantized
4
+ ---
5
+ # Gemma3 Embedding Model: ONNX Conversion Demonstration
6
+
7
+ This repository demonstrates the conversion and comparison of the Gemma3 embedding model from Hugging Face to ONNX format using optimum-onnx. It includes scripts for both ONNX and PyTorch inference pipelines, as well as a comparison of their outputs.
8
+
9
+ ## Files
10
+
11
+ - `onnx_gemma3_pipeline.py`: Runs the Gemma3 embedding model using ONNXRuntime, including post-processing steps (Pooling, Dense, Normalize) with ONNX exported layers.
12
+ - `pytorch_gemma3_pipeline.py`: Runs the original Gemma3 embedding model using PyTorch and SentenceTransformer for reference.
13
+ - `compare_gemma3_onnx_vs_pytorch.py`: Compares the output embeddings and cosine similarities between the ONNX and PyTorch pipelines.
14
+ - `download_missing_hf_files.py`: Downloads required files from Hugging Face and exports Dense layers to ONNX.
15
+ - `gemma3_mean_pooling_basic.py`: The most basic implementation, running Gemma3 ONNX inference with only mean pooling (no Dense or Normalize stages).
16
+
17
+ ## Pipeline Differences
18
+
19
+ Both pipelines use ONNXRuntime for transformer inference via `ORTModelForFeatureExtraction`. The key difference is in post-processing:
20
+
21
+ - **ONNX pipeline** (`onnx_gemma3_pipeline.py`): Uses ONNXRuntime for both the transformer and Dense layers (exported to ONNX), making most of the pipeline ONNX-based except for normalization.
22
+ - **PyTorch pipeline** (`pytorch_gemma3_pipeline.py`): Uses ONNXRuntime for the transformer, but all post-processing (Pooling, Dense, Normalize) is performed with PyTorch modules from SentenceTransformer.
23
+
24
+ This demonstrates how ONNX conversion can offload more computation for faster, hardware-agnostic inference, while the PyTorch pipeline serves as the reference implementation.
25
+
26
+ ## Setup
27
+
28
+ 1. Install dependencies:
29
+ ```sh
30
+ pip install git+https://github.com/simondanielsson/optimum-onnx.git@feature/add-gemma3-export
31
+ pip install git+https://github.com/huggingface/[email protected]
32
+ pip install sentence-transformers onnxruntime safetensors huggingface_hub
33
+ ```
34
+ 2. Export the ONNX model:
35
+ ```sh
36
+ optimum-cli export onnx --model google/embeddinggemma-300m-qat-q4_0-unquantized embeddinggemma-300m-onnx
37
+ python download_missing_hf_files.py
38
+ ```
39
+
40
+ ## Usage
41
+
42
+ - Run the ONNX pipeline:
43
+ ```sh
44
+ python onnx_gemma3_pipeline.py
45
+ ```
46
+ - Run the PyTorch pipeline:
47
+ ```sh
48
+ python pytorch_gemma3_pipeline.py
49
+ ```
50
+ - Compare outputs:
51
+ ```sh
52
+ python compare_gemma3_onnx_vs_pytorch.py
53
+ ```
54
+
55
+ ## Results
56
+
57
+ The comparison script prints cosine similarities between sample word embeddings (e.g., "apple", "banana", "car") for both ONNX and PyTorch pipelines, demonstrating the fidelity of the ONNX conversion.
58
+
59
+ ## References
60
+ - [Optimum-ONNX Gemma3 PR](https://github.com/huggingface/optimum-onnx/pull/50)
61
+ - [Gemma3 Model](https://huggingface.co/google/embeddinggemma-300m-qat-q4_0-unquantized)