cstr
/

Text-to-Speech
German
f5_tts
speech
F5-TTS
cstr commited on
Commit
d81d4d3
·
verified ·
1 Parent(s): a5a82f4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ license: cc-by-nc-4.0
5
+ tags:
6
+ - speech
7
+ - text-to-speech
8
+ - F5-TTS
9
+ datasets:
10
+ - amphion/Emilia-Dataset
11
+ - fsicoli/common_voice_19_0
12
+ library_name: f5_tts
13
+ base_model:
14
+ - SWivid/F5-TTS
15
+ ---
16
+
17
+ # German Voice Cloning TTS Model using F5-TTS Architecture
18
+
19
+ This is an attempt at an mlx conversion, to use per f5-tts-mlx.
20
+
21
+ A German Text-to-Speech system capable of cloning voices from a few seconds of reference audio, built on the F5-TTS architecture.
22
+
23
+ ## Model Details
24
+ - **Developed by:** Johanna Reiml and team at KI-Servicezentrum, Hasso-Plattner-Institut (HPI)
25
+ - **Base Model:** [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
26
+ - **Paper:** [F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching](https://arxiv.org/abs/2410.06885)
27
+
28
+ ## Key Features & Capabilities
29
+ - Generates natural-sounding German speech from text
30
+ - Clones voices using minimal reference audio (few seconds)
31
+ - Suitable for audiobooks, voice assistants, and accessibility applications
32
+
33
+ ## Technical Specifications
34
+ Download checkpoints from the directories F5TTS_Base (vocos) or F5TTS_Base_bigvgan (bigvgan).
35
+ - **Datasets:** Common Voice (Mozilla) and Emilia_DE
36
+ - **Process:** Fine-tuned checkpoints of [base F5-TTS model](https://huggingface.co/SWivid/F5-TTS)
37
+ - **Trained on Hardware:** 8x NVIDIA H100
38
+
39
+ ## Contact
40
+ - AI Service Center: [email protected]
41
+ - Johanna Reiml: [email protected]
42
+ - Enes Suermeli: [email protected]
43
+ - Kajo Kratzenstein: [email protected]
44
+ - Carlos Menke: [email protected]
45
+
46
+
47
+ ## Acknowledgements
48
+ The authors acknowledge the financial support by the German Federal Ministry for Education and Research (BMBF) through the project «KI-Servicezentrum Berlin Brandenburg» (01IS22092).