prithivMLmods commited on
Commit
944263e
·
verified ·
1 Parent(s): 357a7a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -1
README.md CHANGED
@@ -2,12 +2,33 @@
2
  license: apache-2.0
3
  datasets:
4
  - stapesai/ssi-speech-emotion-recognition
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
 
 
 
 
 
 
 
 
 
7
  ```py
8
  Classification Report:
9
 
10
- precision recall f1-score support
11
 
12
  Anger 0.8314 0.9346 0.8800 306
13
  Calm 0.7949 0.8857 0.8378 35
@@ -26,3 +47,120 @@ weighted avg 0.8392 0.8379 0.8367 1999
26
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/oW8Qa6MO2koMOhRQgVd6a.png)
27
 
28
  ![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/w_wC5gmrWhNlPYS_ftYSC.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - stapesai/ssi-speech-emotion-recognition
5
+ language:
6
+ - en
7
+ base_model:
8
+ - facebook/wav2vec2-base-960h
9
+ pipeline_tag: audio-classification
10
+ library_name: transformers
11
+ tags:
12
+ - emotion
13
+ - classification
14
+ - audio
15
+ - music
16
+ - facebook
17
  ---
18
 
19
+ # Speech-Emotion-Classification
20
+
21
+ > **Speech-Emotion-Classification** is a fine-tuned version of `facebook/wav2vec2-base-960h` for **multi-class audio classification**, specifically trained to detect **emotions** in speech. This model utilizes the `Wav2Vec2ForSequenceClassification` architecture to accurately classify speaker emotions from audio signals.
22
+
23
+ > \[!note]
24
+ > Wav2Vec2: Self-Supervised Learning for Speech Recognition
25
+ > [https://arxiv.org/pdf/2006.11477](https://arxiv.org/pdf/2006.11477)
26
+
27
+
28
  ```py
29
  Classification Report:
30
 
31
+ precision recall f1-score test_support
32
 
33
  Anger 0.8314 0.9346 0.8800 306
34
  Calm 0.7949 0.8857 0.8378 35
 
47
  ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/oW8Qa6MO2koMOhRQgVd6a.png)
48
 
49
  ![download (1).png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/w_wC5gmrWhNlPYS_ftYSC.png)
50
+
51
+ ---
52
+
53
+ ## Label Space: 8 Classes
54
+
55
+ ```
56
+ Class 0: Anger
57
+ Class 1: Calm
58
+ Class 2: Disgust
59
+ Class 3: Fear
60
+ Class 4: Happy
61
+ Class 5: Neutral
62
+ Class 6: Sad
63
+ Class 7: Surprised
64
+ ```
65
+
66
+ ---
67
+
68
+ ## Install Dependencies
69
+
70
+ ```bash
71
+ pip install gradio transformers torch librosa hf_xet
72
+ ```
73
+
74
+ ---
75
+
76
+ ## Inference Code
77
+
78
+ ```python
79
+ import gradio as gr
80
+ from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
81
+ import torch
82
+ import librosa
83
+
84
+ # Load model and processor
85
+ model_name = "prithivMLmods/Speech-Emotion-Classification"
86
+ model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
87
+ processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
88
+
89
+ # Label mapping
90
+ id2label = {
91
+ "0": "Anger",
92
+ "1": "Calm",
93
+ "2": "Disgust",
94
+ "3": "Fear",
95
+ "4": "Happy",
96
+ "5": "Neutral",
97
+ "6": "Sad",
98
+ "7": "Surprised"
99
+ }
100
+
101
+ def classify_audio(audio_path):
102
+ # Load and resample audio to 16kHz
103
+ speech, sample_rate = librosa.load(audio_path, sr=16000)
104
+
105
+ # Process audio
106
+ inputs = processor(
107
+ speech,
108
+ sampling_rate=sample_rate,
109
+ return_tensors="pt",
110
+ padding=True
111
+ )
112
+
113
+ with torch.no_grad():
114
+ outputs = model(**inputs)
115
+ logits = outputs.logits
116
+ probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
117
+
118
+ prediction = {
119
+ id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
120
+ }
121
+
122
+ return prediction
123
+
124
+ # Gradio Interface
125
+ iface = gr.Interface(
126
+ fn=classify_audio,
127
+ inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
128
+ outputs=gr.Label(num_top_classes=8, label="Emotion Classification"),
129
+ title="Speech Emotion Classification",
130
+ description="Upload an audio clip to classify the speaker's emotion from voice signals."
131
+ )
132
+
133
+ if __name__ == "__main__":
134
+ iface.launch()
135
+ ```
136
+
137
+ ---
138
+
139
+
140
+
141
+ ## Original Label
142
+
143
+ ```py
144
+ "id2label": {
145
+ "0": "ANG",
146
+ "1": "CAL",
147
+ "2": "DIS",
148
+ "3": "FEA",
149
+ "4": "HAP",
150
+ "5": "NEU",
151
+ "6": "SAD",
152
+ "7": "SUR"
153
+ },
154
+ ```
155
+
156
+ ---
157
+
158
+ ## Intended Use
159
+
160
+ `Speech-Emotion-Classification` is designed for:
161
+
162
+ * **Speech Emotion Analytics** – Analyze speaker emotions in call centers, interviews, or therapeutic sessions.
163
+ * **Conversational AI Personalization** – Adjust voice assistant responses based on detected emotion.
164
+ * **Mental Health Monitoring** – Support emotion recognition in voice-based wellness or teletherapy apps.
165
+ * **Voice Dataset Curation** – Tag or filter speech datasets by emotion for research or model training.
166
+ * **Media Annotation** – Automatically annotate podcasts, audiobooks, or videos with speaker emotion metadata.