cstr
/

aihpi_f5_german_mlx_q4

Model card Files Files and versions

cstr commited on May 27

Commit

d81d4d3

·

verified ·

1 Parent(s): a5a82f4

Create README.md

Files changed (1) hide show

README.md +48 -0

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+---
+language:
+- de
+license: cc-by-nc-4.0
+tags:
+- speech
+- text-to-speech
+- F5-TTS
+datasets:
+- amphion/Emilia-Dataset
+- fsicoli/common_voice_19_0
+library_name: f5_tts
+base_model:
+- SWivid/F5-TTS
+---
+# German Voice Cloning TTS Model using F5-TTS Architecture
+This is an attempt at an mlx conversion, to use per f5-tts-mlx.
+A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.
+## Model Details
+- **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
+- **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
+- **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885)
+## Key Features & Capabilities
+- Generates natural-sounding German speech from text
+- Clones voices using minimal reference audio (few seconds)
+- Suitable for audiobooks, voice assistants, and accessibility applications
+## Technical Specifications
+Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
+- **Datasets:** Common Voice (Mozilla) and Emilia_DE
+- **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS)
+- **Trained on Hardware:** 8x NVIDIA H100
+## Contact
+- AI Service Center: [email protected]
+- Johanna Reiml: [email protected]
+- Enes Suermeli: [email protected]
+- Kajo Kratzenstein: [email protected]
+- Carlos Menke: [email protected]
+## Acknowledgements
+The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).