CUAIStudents
/

DeepAr

@@ -2,198 +2,207 @@
 library_name: transformers
 tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 library_name: transformers
 tags: []
 ---
+# DeepAr
+## Model Description
+DeepAr is a state-of-the-art Arabic Automatic Speech Recognition (ASR) model based on whisper-turbo-v3 architecture. This model represents our latest and most advanced version, trained on the complete [CUAIStudents/Ar-ASR](https://huggingface.co/datasets/CUAIStudents/Ar-ASR) dataset for optimal performance.
+**Key Features:**
+- **High-fidelity transcription**: Transcribes exactly what is pronounced, maintaining authenticity of speech patterns
+- **Speech improvement tool**: Designed to help users identify and correct speech patterns
+- **Superior performance**: Outperforms many existing Arabic ASR models based on Whisper and its variants
+- **Arabic with Tashkil**: Provides accurate diacritization for comprehensive Arabic text output
+## What Makes DeepAr Different
+Unlike traditional ASR models that normalize speech to standard text, DeepAr transcribes **exactly what is pronounced**. This unique approach makes it particularly valuable for:
+- **Speech therapy and improvement**: Identifies pronunciation patterns and deviations
+- **Language learning**: Helps learners understand their actual pronunciation vs. intended speech
+- **Linguistic research**: Captures authentic speech patterns for analysis
+- **Pronunciation assessment**: Provides detailed feedback on spoken Arabic
+## Model Details
+- **Base Architecture**: whisper-turbo-v3
+- **Language**: Arabic (with Tashkil/diacritics)
+- **Task**: High-fidelity Automatic Speech Recognition
+- **Training Data**: Complete [CUAIStudents/Ar-ASR](https://huggingface.co/datasets/CUAIStudents/Ar-ASR) dataset
+- **Model Type**: Production-ready, latest version
+## Performance
+DeepAr demonstrates superior performance compared to many Arabic ASR models built on Whisper and its variants, particularly excelling in:
+- Pronunciation accuracy detection
+- Diacritic prediction
+- Handling of Arabic speech variations
+- Authentic speech pattern recognition
+## Intended Use
+This model is ideal for:
+- Speech therapy and pronunciation correction applications
+- Arabic language learning platforms
+- Linguistic research and analysis
+- Educational tools for speech improvement
+- Applications requiring authentic speech transcription
+- Quality assessment of spoken Arabic
+## Usage
+### Installation
+```bash
+pip install transformers torch torchaudio
+```
+### Quick Start
+```python
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+import torch
+import torchaudio
+# Load model and processor
+processor = WhisperProcessor.from_pretrained("CUAIStudents/DeepAr")
+model = WhisperForConditionalGeneration.from_pretrained("CUAIStudents/DeepAr")
+# Load and preprocess audio
+audio_path = "path_to_your_arabic_audio.wav"
+waveform, sample_rate = torchaudio.load(audio_path)
+# Resample to 16kHz if necessary
+if sample_rate != 16000:
+    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
+    waveform = resampler(waveform)
+# Process audio
+input_features = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features
+# Generate transcription
+with torch.no_grad():
+    predicted_ids = model.generate(input_features, language="ar")
+# Decode transcription (exactly as pronounced)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+print(f"Pronounced as: {transcription}")
+```
+### Speech Analysis Example
+```python
+def analyze_pronunciation(audio_path, target_text=None):
+    """
+    Analyze pronunciation and compare with target text if provided
+    """
+    waveform, sample_rate = torchaudio.load(audio_path)
+    if sample_rate != 16000:
+        resampler = torchaudio.transforms.Resample(sample_rate, 16000)
+        waveform = resampler(waveform)
+    input_features = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features
+    with torch.no_grad():
+        predicted_ids = model.generate(input_features, language="ar")
+    actual_pronunciation = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+    print(f"Actual pronunciation: {actual_pronunciation}")
+    if target_text:
+        print(f"Target text: {target_text}")
+        print("Analysis: Compare the differences for speech improvement")
+    return actual_pronunciation
+# Example usage
+pronunciation = analyze_pronunciation("student_reading.wav", "النص المطلوب قراءته")
+```
+### Batch Processing for Speech Assessment
+```python
+def assess_multiple_recordings(audio_files, target_texts=None):
+    """
+    Process multiple recordings for comprehensive speech assessment
+    """
+    results = []
+    for i, audio_file in enumerate(audio_files):
+        waveform, sample_rate = torchaudio.load(audio_file)
+        if sample_rate != 16000:
+            resampler = torchaudio.transforms.Resample(sample_rate, 16000)
+            waveform = resampler(waveform)
+        input_features = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features
+        with torch.no_grad():
+            predicted_ids = model.generate(input_features, language="ar")
+        pronunciation = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+        result = {
+            'file': audio_file,
+            'pronunciation': pronunciation,
+            'target': target_texts[i] if target_texts else None
+        }
+        results.append(result)
+        print(f"File {i+1}: {pronunciation}")
+    return results
+# Example usage
+audio_files = ["recording1.wav", "recording2.wav", "recording3.wav"]
+target_texts = ["النص الأول", "النص الثاني", "النص الثالث"]
+assessment_results = assess_multiple_recordings(audio_files, target_texts)
+```
+## Training Data
+This model was trained on the complete [CUAIStudents/Ar-ASR](https://huggingface.co/datasets/CUAIStudents/Ar-ASR) dataset, utilizing the full scope of available Arabic speech data with corresponding high-quality transcriptions including diacritics.
+## Model Advantages
+- **Authentic transcription**: Captures exactly what is spoken, not what should be spoken
+- **High accuracy**: Superior performance compared to similar Whisper-based Arabic models
+- **Comprehensive training**: Utilizes the complete dataset for optimal coverage
+- **Practical applications**: Specifically designed for speech improvement and assessment
+- **Diacritic accuracy**: Excellent performance in Arabic diacritization
+## Limitations
+- **MSA focus**: Optimized primarily for Modern Standard Arabic (MSA) rather than dialectal variations
+## License
+This model is released under the MIT License.
+```
+MIT License
+Copyright (c) 2024 CUAIStudents
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+```