Mardiyyah commited on
Commit
81a411e
·
verified ·
1 Parent(s): 0d38692

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -16
README.md CHANGED
@@ -16,7 +16,6 @@ model-index:
16
  datasets:
17
  - asr-nigerian-pidgin/nigerian-pidgin-1.0
18
  pipeline_tag: automatic-speech-recognition
19
- library_name: transformers
20
  ---
21
 
22
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -24,36 +23,41 @@ should probably proofread and complete it, then remove this comment. -->
24
 
25
  # pidgin-wav2vec2-xlsr53
26
 
27
- This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), adapted for transcribing Nigerian Pidgin English. Building on the self-supervised, cross-lingual representations of XLSR-53, it has been trained using the [Nigerian Pidgin dataset](https://huggingface.co/datasets/asr-nigerian-pidgin/nigerian-pidgin-1.0) to handle the phonetic and lexical nuances unique to Nigerian Pidgin, offering significant improvements over zero-shot ASR baselines
28
-
29
-
30
  It achieves the following results on the evaluation set:
31
  - Loss: 0.6907
32
  - Wer: 0.3161 (val)
33
 
 
34
 
35
- ## Intended uses & limitations
36
 
37
- **Intended Use**: Best suited for automatic speech recognition (ASR) tasks on Nigerian Pidgin audio, such as speech-to-text conversion and related downstream tasks. Best performance is achieved in a clean recording environments with limited background noise.
38
 
39
- **Limitations/Caveats**:
 
 
40
 
41
- - Trained exclusively on speech from limited demographic groups; may underperform on dialects or accents outside the training set.
 
 
 
 
42
 
43
- - Struggles with numeric phrases and unusual phonetic variants, as noted in qualitative evaluations [see here]
44
- - Struggles with noisy environment and fast-paced speech
45
- - Not suited for critically high-accuracy domains (e.g., legal, medical domain) without further tuning.
46
 
47
  ## Training and evaluation data
48
 
49
- *to be updated*
 
50
 
51
  ## Training procedure
 
52
 
 
53
  ### Training hyperparameters
54
 
55
  The following hyperparameters were used during training:
56
- - learning_rate: 0.0001
57
  - train_batch_size: 4
58
  - eval_batch_size: 4
59
  - seed: 3407
@@ -65,7 +69,15 @@ The following hyperparameters were used during training:
65
  - num_epochs: 30
66
  - mixed_precision_training: Native AMP
67
 
68
- ### Training results
 
 
 
 
 
 
 
 
69
 
70
  | Training Loss | Epoch | Step | Validation Loss | Wer |
71
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
@@ -93,7 +105,7 @@ The following hyperparameters were used during training:
93
 
94
  ### Framework versions
95
 
96
- - Transformers 4.37.2
97
  - Pytorch 2.0.1+cu117
98
- - Datasets 2.12.0
99
  - Tokenizers 0.15.2
 
16
  datasets:
17
  - asr-nigerian-pidgin/nigerian-pidgin-1.0
18
  pipeline_tag: automatic-speech-recognition
 
19
  ---
20
 
21
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
23
 
24
  # pidgin-wav2vec2-xlsr53
25
 
26
+ This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the [Nigerian Pidgin](https://huggingface.co/datasets/asr-nigerian-pidgin/nigerian-pidgin-1.0) dataset.
 
 
27
  It achieves the following results on the evaluation set:
28
  - Loss: 0.6907
29
  - Wer: 0.3161 (val)
30
 
31
+ ## Model description
32
 
33
+ *to be updated*
34
 
35
+ ## Intended uses & limitations
36
 
37
+ **Intended Uses**:
38
+ - Best suited for automatic speech recognition (ASR) tasks on Nigerian Pidgin audio, such as speech-to-text conversion and related downstream tasks.
39
+ - Academic research on low-resource and creole language ASR.
40
 
41
+ **Known Limitations**:
42
+ - Performance may degrade with dialectal variation, heavy code-switching, or noisy audio environments.
43
+ - Model reflects biases present in the training dataset, which may affect accuracy on underrepresented demographics, phonetic variations or topics.
44
+ - May struggle with rare words, numerals, and domain-specific terminology not well represented in the training set.
45
+ - Not recommended for high-stakes domains (e.g., legal, medical) without domain-specific retraining/finetuning.
46
 
 
 
 
47
 
48
  ## Training and evaluation data
49
 
50
+ The model was fine-tuned on the [Nigerian Pidgin ASR v1.0 dataset](https://huggingface.co/datasets/asr-nigerian-pidgin/nigerian-pidgin-1.0), consisting of over 4,200 utterances recorded by 10 native speakers (balanced across gender and age) using the LIG-Aikuma mobile platform. Recordings were collected in controlled environments to ensure high-quality audio.
51
+ Performance: WER 7.4%(train), 31.6% (validation) / 29.6% (test), exceeding baseline benchmarks like QuartzNet and zero-shot XLSR. This results demonstrate the effectiveness of targeted fine-tuning for low-resource ASR.
52
 
53
  ## Training procedure
54
+ We fine-tuned the facebook/wav2vec2-large-xlsr-53 model using the Nigerian Pidgin ASR dataset, following the methodology outlined in the XLSR-53 paper. Training was performed on a single NVIDIA A100 GPU using the Hugging Face transformers library with fp16 mixed precision to accelerate computation and reduce memory usage.
55
 
56
+ A key modification from the standard setup was unfreezing the feature encoder during fine-tuning. This adjustment yielded improved performance, lowering word error rates (WER) on both validation and test sets compared to the frozen-encoder approach.
57
  ### Training hyperparameters
58
 
59
  The following hyperparameters were used during training:
60
+ - learning_rate: 1e-4
61
  - train_batch_size: 4
62
  - eval_batch_size: 4
63
  - seed: 3407
 
69
  - num_epochs: 30
70
  - mixed_precision_training: Native AMP
71
 
72
+ This configuration balanced training stability, efficiency, and accuracy, allowing the model to adapt effectively to Nigerian Pidgin speech patterns despite the dataset’s limited size
73
+ ### Perfomance Comparision for Frozen Encoder and Unfrozen Encoder:
74
+ | Encoder State | Val WER | Test WER |
75
+ | ------------- | ------- | -------- |
76
+ | Frozen | 0.332 | 0.436 |
77
+ | Unfrozen | 0.3161 | 0.296 |
78
+
79
+
80
+ ### Training results(Unfrozen Model)
81
 
82
  | Training Loss | Epoch | Step | Validation Loss | Wer |
83
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
 
105
 
106
  ### Framework versions
107
 
108
+ - Transformers 4.48.2
109
  - Pytorch 2.0.1+cu117
110
+ - Datasets 2.20.0
111
  - Tokenizers 0.15.2