diarray commited on
Commit
667e36a
·
verified ·
1 Parent(s): ad8e76f

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +132 -0
  3. soloba-ctc-0.6b-v0.nemo +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ soloba-ctc-0.6b-v0.nemo filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bm
4
+ library_name: nemo
5
+ datasets:
6
+ - RobotsMali/kunkado
7
+ - RobotsMali/bam-asr-early
8
+
9
+ thumbnail: null
10
+ tags:
11
+ - automatic-speech-recognition
12
+ - speech
13
+ - audio
14
+ - Transducer
15
+ - TDT
16
+ - FastConformer
17
+ - Conformer
18
+ - pytorch
19
+ - Bambara
20
+ - NeMo
21
+ license: cc-by-4.0
22
+ base_model: nvidia/parakeet-ctc-0.6b
23
+ model-index:
24
+ - name: soloba-ctc-0.6b-v0
25
+ results:
26
+ - task:
27
+ name: Automatic Speech Recognition
28
+ type: automatic-speech-recognition
29
+ dataset:
30
+ name: bam-asr-early
31
+ type: RobotsMali/bam-asr-early
32
+ split: test
33
+ args:
34
+ language: bm
35
+ metrics:
36
+ - name: Test WER
37
+ type: wer
38
+ value: 35.15760898590088
39
+
40
+ metrics:
41
+ - wer
42
+ pipeline_tag: automatic-speech-recognition
43
+ ---
44
+
45
+ # Soloni TDT-CTC 114M Bambara
46
+
47
+ <style>
48
+ img {
49
+ display: inline;
50
+ }
51
+ </style>
52
+
53
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--CTC-blue#model-badge)](#model-architecture)
54
+ | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
55
+ | [![Language](https://img.shields.io/badge/Language-en-orange#model-badge)](#datasets)
56
+
57
+ `soloba-ctc-0.6b-v0` is a fine tuned version of [`nvidia/parakeet-ctc-0.6b`](https://huggingface.co/nvidia/parakeet-ctc-0.6b) on [RobotsMali/kunkado](https://huggingface.co/datasets/RobotsMali/kunkado). This model cannot does produce Capitalizations but not Punctuations. The model was fine-tuned using **NVIDIA NeMo**.
58
+
59
+ The model doesn't tag code swicthed expressions in its transcription since for training this model we decided to treat them as a modern variant of the Bambara Language removing all tags and markages.
60
+
61
+ ## **🚨 Important Note**
62
+ This model, along with its associated resources, is part of an **ongoing research effort**, improvements and refinements are expected in future versions. A human evaluation report of the model is coming soon. Users should be aware that:
63
+
64
+ - **The model may not generalize very well accross all speaking conditions and dialects.**
65
+ - **Community feedback is welcome, and contributions are encouraged to refine the model further.**
66
+
67
+ ## NVIDIA NeMo: Training
68
+
69
+ To fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
70
+
71
+ ```bash
72
+ pip install nemo_toolkit['asr']
73
+ ```
74
+
75
+ ## How to Use This Model
76
+
77
+ Note that this model has been released for research purposes primarily.
78
+
79
+ ### Load Model with NeMo
80
+ ```python
81
+ import nemo.collections.asr as nemo_asr
82
+ asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="RobotsMali/soloba-ctc-0.6b-v0")
83
+ ```
84
+
85
+ ### Transcribe Audio
86
+ ```python
87
+ model.eval()
88
+ # Assuming you have a test audio file named sample_audio.wav
89
+ asr_model.transcribe(['sample_audio.wav'])
90
+ ```
91
+
92
+ ### Input
93
+
94
+ This model accepts any **mono-channel audio (wav files)** as input and resamples them to *16 kHz sample rate* before performing the forward pass
95
+
96
+ ### Output
97
+
98
+ This model provides transcribed speech as a string for a given speech sample and return an Hypothesis object (under nemo>=2.3)
99
+
100
+ ## Model Architecture
101
+
102
+ This model uses a Hybrid FastConformer-TDT-CTC architecture. FastConformer is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer). And a Convolutional Neural Net with CTC loss, the ***Connectionist Temporal Classification*** decoder
103
+
104
+ ## Training
105
+
106
+ The NeMo toolkit (version 2.3.0) was used for finetuning this model for **183,086 steps** over `nvidia/parakeet-ctc-0.6b` model. This version is trained with this [base config](https://github.com/diarray-hub/bambara-asr/blob/main/kunkado-training/config/soloba/soloba-ctc-v0.0.0.yaml). The full training configurations, scripts, and experimental logs are available here:
107
+
108
+ 🔗 [Bambara-ASR Experiments](https://github.com/diarray-hub/bambara-asr)
109
+
110
+ The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
111
+
112
+ ## Dataset
113
+ This model was fine-tuned on the [kunkado](https://huggingface.co/datasets/RobotsMali/kunkado) dataset and the [bam-asr-early](https://huggingface.co/datasets/RobotsMali/bam-asr-early) dataset, the semi-labelled subset, which consists of **~120 hours of automatically annotated Bambara speech data**.
114
+
115
+ ## Performance
116
+
117
+ We report the Word Error Rate on the test set of bam-asr-early.
118
+
119
+ |**Decoder (Version)**|**Tokenizer**|**Vocabulary Size**|**bam-asr-all**|
120
+ |---------|-----------------------|-----------------|---------|---------|
121
+ | v0 | BPE | 512 | 35.16 |
122
+
123
+
124
+ ## License
125
+ This model is released under the **CC-BY-4.0** license. By using this model, you agree to the terms of the license.
126
+
127
+ ---
128
+
129
+ Feel free to open a discussion on Hugging Face or [file an issue](https://github.com/diarray-hub/bambara-asr/issues) on github if you have any contributions
130
+
131
+ ---
132
+
soloba-ctc-0.6b-v0.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a0d09f2d6b62698d34e303a2bd25ebe575ad4de76fde2a49fada558abd6b78a
3
+ size 2434017280