istupakov commited on
Commit
f4ba066
·
1 Parent(s): ac85339

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - openai/whisper-base
5
+ ---
6
+
7
+ # Whisper base
8
+
9
+ Whisper base [model](https://huggingface.co/openai/whisper-base) converted to ONNX format for [onnx_asr](https://github.com/istupakov/onnx-asr).
10
+
11
+ ## Install onnx-asr
12
+ ```shell
13
+ pip install onnx-asr[cpu,hub]
14
+ ```
15
+
16
+ ## Load whisper-base model and recognize wav file
17
+ ```py
18
+ import onnx_asr
19
+ model = onnx_asr.load_model("whisper-base")
20
+ print(model.recognize("test.wav"))
21
+ ```
22
+
23
+ ## Code for models export
24
+
25
+ Export Whisper to ONNX with `onnxruntime` ([whisper.convert_to_onnx](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/whisper/README.md)).
26
+
27
+ Download model and export with Beam Search and Forced Decoder Input Ids:
28
+ ```shell
29
+ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --output whisper-onnx --use_external_data_format --use_forced_decoder_ids --optimize_onnx --precision fp32
30
+ ```
31
+
32
+ Save tokenizer vocabulary
33
+ ```py
34
+ from transformers import WhisperTokenizer
35
+
36
+ tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base")
37
+
38
+ with open("whisper-onnx/vocab.txt", "w") as f:
39
+ for token, id in tokenizer.get_vocab().items():
40
+ f.write(f"{token} {id}\n")
41
+ ```