F5-TTS Arabic

نموذج تحويل النص إلى كلام باللغة العربية

Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.

Model Details

Base Model: F5-TTS
Training Data: ~300 hours of clean Arabic audio
Language: Modern Standard Arabic (MSA)

Usage

Quick Start

for infernce with text chunking see the Colab notebook.

from huggingface_hub import hf_hub_download

# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")

# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
  --model_cfg "{config_file}" \
  --output_file "./output.wav" \
  --model "F5TTS_Base" \
  --ckpt_file "{ckpt_file}" \
  --vocab_file "{vocab_file}" \
  --ref_audio "{ref_audio}" \
  --nfe_step 32 \
  --cfg_strength 1.8 \
  --ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
  --gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
  --speed 0.9

Key Features

  • High-quality Arabic speech synthesis
  • Consistent voice cloning from reference audio
  • Works best with moderate text lengths (chunking recommended for long texts)
  • Supports speed adjustment
  • Fine-tunable for specific use cases

Input Requirements

Critical: Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.

Example of correct input:

إِنَّ الْعِلْمَ نُورٌ يُقْذَفُ فِي الْقَلْبِ

Sample Output

Text: إِنَّ الْعِلْمَ لَيْسَ بِكَثْرَةِ الرِّوَايَةِ، وَإِنَّمَا هُوَ نُورٌ يُقْذَفُ فِي الْقَلْبِ، يَفْهَمُ بِهِ الْعَبْدُ حَقَائِقَ الْأُمُورِ. وَالْحِكْمَةُ ضَالَّةُ الْمُؤْمِنِ، فَحَيْثُمَا وَجَدَهَا فَهُوَ أَحَقُّ بِهَا. وَمَنْ طَلَبَ الْعُلَا مِنْ غَيْرِ كَدٍّ، أَضَاعَ الْعُمُرَ فِي طَلَبِ الْمُحَالِ. فَاصْبِرْ عَلَى مُرِّ الْحَقِّ، وَلَا تَسْتَعْجِلْ قَطْفَ الثَّمَرَةِ قَبْلَ نُضْجِهَا، فَإِنَّ لِكُلِّ شَيْءٍ أَوَانًا، وَلِكُلِّ مَقَامٍ مَقَالًا.

refernce

Further Fine-tuning

The model can be further fine-tuned for:

  • Non-diacritized text (requires additional training)
  • Specific voice characteristics
  • Domain-specific vocabulary
  • Dialectal variations

License

This model is released under a Non-Commercial License.

  • You may use this model for research, educational, and personal non-commercial purposes.
  • Commercial use is strictly prohibited without explicit permission.
  • If you wish to use this model for commercial purposes, please contact the model author.

Limitations

  • Requires fully diacritized Arabic text as input
  • Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
  • Performance may vary with very long texts without chunking
  • Voice cloning quality depends on reference audio quality and length
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IbrahimSalah/Arabic-F5-TTS-v2

Base model

SWivid/F5-TTS
Finetuned
(70)
this model

Spaces using IbrahimSalah/Arabic-F5-TTS-v2 3