F5-TTS Arabic
نموذج تحويل النص إلى كلام باللغة العربية
Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.
Model Details
Base Model: F5-TTS
Training Data: ~300 hours of clean Arabic audio
Language: Modern Standard Arabic (MSA)
Usage
Quick Start
for infernce with text chunking see the Colab notebook.
from huggingface_hub import hf_hub_download
# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")
# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
--model_cfg "{config_file}" \
--output_file "./output.wav" \
--model "F5TTS_Base" \
--ckpt_file "{ckpt_file}" \
--vocab_file "{vocab_file}" \
--ref_audio "{ref_audio}" \
--nfe_step 32 \
--cfg_strength 1.8 \
--ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
--gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
--speed 0.9
Key Features
- High-quality Arabic speech synthesis
- Consistent voice cloning from reference audio
- Works best with moderate text lengths (chunking recommended for long texts)
- Supports speed adjustment
- Fine-tunable for specific use cases
Input Requirements
Critical: Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.
Example of correct input:
إِنَّ الْعِلْمَ نُورٌ يُقْذَفُ فِي الْقَلْبِ
Sample Output
Text: إِنَّ الْعِلْمَ لَيْسَ بِكَثْرَةِ الرِّوَايَةِ، وَإِنَّمَا هُوَ نُورٌ يُقْذَفُ فِي الْقَلْبِ، يَفْهَمُ بِهِ الْعَبْدُ حَقَائِقَ الْأُمُورِ. وَالْحِكْمَةُ ضَالَّةُ الْمُؤْمِنِ، فَحَيْثُمَا وَجَدَهَا فَهُوَ أَحَقُّ بِهَا. وَمَنْ طَلَبَ الْعُلَا مِنْ غَيْرِ كَدٍّ، أَضَاعَ الْعُمُرَ فِي طَلَبِ الْمُحَالِ. فَاصْبِرْ عَلَى مُرِّ الْحَقِّ، وَلَا تَسْتَعْجِلْ قَطْفَ الثَّمَرَةِ قَبْلَ نُضْجِهَا، فَإِنَّ لِكُلِّ شَيْءٍ أَوَانًا، وَلِكُلِّ مَقَامٍ مَقَالًا.
refernce
Further Fine-tuning
The model can be further fine-tuned for:
- Non-diacritized text (requires additional training)
- Specific voice characteristics
- Domain-specific vocabulary
- Dialectal variations
License
This model is released under a Non-Commercial License.
- You may use this model for research, educational, and personal non-commercial purposes.
- Commercial use is strictly prohibited without explicit permission.
- If you wish to use this model for commercial purposes, please contact the model author.
Limitations
- Requires fully diacritized Arabic text as input
- Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
- Performance may vary with very long texts without chunking
- Voice cloning quality depends on reference audio quality and length
Model tree for IbrahimSalah/Arabic-F5-TTS-v2
Base model
SWivid/F5-TTS