eustlb HF Staff commited on
Commit
1d5178a
·
1 Parent(s): 31083d9

readme update

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -18,6 +18,7 @@ datasets:
18
  - MLCommons/peoples_speech
19
  thumbnail: null
20
  tags:
 
21
  - automatic-speech-recognition
22
  - speech
23
  - audio
@@ -182,6 +183,77 @@ img {
182
  It is an XL version of FastConformer CTC [1] (around 600M parameters) model.
183
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
184
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
  ## NVIDIA NeMo: Training
186
 
187
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
 
18
  - MLCommons/peoples_speech
19
  thumbnail: null
20
  tags:
21
+ - transformers
22
  - automatic-speech-recognition
23
  - speech
24
  - audio
 
183
  It is an XL version of FastConformer CTC [1] (around 600M parameters) model.
184
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
185
 
186
+ ## Transformers
187
+
188
+ You can now run Parakeet CTC natively with [Transformers](https://github.com/huggingface/transformers) 🤗
189
+
190
+ ```bash
191
+ pip install git+https://github.com/huggingface/transformers
192
+ ```
193
+
194
+ <details>
195
+ <summary>➡️ Pipeline usage</summary>
196
+
197
+ ```python
198
+ from transformers import pipeline
199
+
200
+ pipe = pipeline("automatic-speech-recognition", model="nvidia/parakeet-ctc-0.6b")
201
+ out = pipe("https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3")
202
+ print(out)
203
+ ```
204
+ </details>
205
+
206
+ <details>
207
+ <summary>➡️ AutoModel</summary>
208
+
209
+ ```python
210
+ from transformers import AutoModelForCTC, AutoProcessor
211
+ from datasets import load_dataset, Audio
212
+ import torch
213
+
214
+ device = "cuda" if torch.cuda.is_available() else "cpu"
215
+
216
+ processor = AutoProcessor.from_pretrained("nvidia/parakeet-ctc-0.6b")
217
+ model = AutoModelForCTC.from_pretrained("nvidia/parakeet-ctc-0.6b", dtype="auto", device_map=device)
218
+
219
+ ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
220
+ ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
221
+ speech_samples = [el['array'] for el in ds["audio"][:5]]
222
+
223
+ inputs = processor(speech_samples, sampling_rate=processor.feature_extractor.sampling_rate)
224
+ inputs.to(model.device, dtype=model.dtype)
225
+ outputs = model.generate(**inputs)
226
+ print(processor.batch_decode(outputs))
227
+ ```
228
+ </details>
229
+
230
+ <details>
231
+ <summary>➡️ Training</summary>
232
+
233
+ ```python
234
+ from transformers import AutoModelForCTC, AutoProcessor
235
+ from datasets import load_dataset, Audio
236
+ import torch
237
+
238
+ device = "cuda" if torch.cuda.is_available() else "cpu"
239
+
240
+ processor = AutoProcessor.from_pretrained("nvidia/parakeet-ctc-0.6b")
241
+ model = AutoModelForCTC.from_pretrained("nvidia/parakeet-ctc-0.6b", dtype="auto", device_map=device)
242
+
243
+ ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
244
+ ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
245
+ speech_samples = [el['array'] for el in ds["audio"][:5]]
246
+ text_samples = [el for el in ds["text"][:5]]
247
+
248
+ # passing `text` to the processor will prepare inputs' `labels` key
249
+ inputs = processor(audio=speech_samples, text=text_samples, sampling_rate=processor.feature_extractor.sampling_rate)
250
+ inputs.to(device, dtype=model.dtype)
251
+
252
+ outputs = model(**inputs)
253
+ outputs.loss.backward()
254
+ ```
255
+ </details>
256
+
257
  ## NVIDIA NeMo: Training
258
 
259
  To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.