File size: 3,846 Bytes
fedeb82
 
 
 
 
 
 
 
 
837594d
 
 
 
 
 
 
06ee6bc
126a775
 
2df5db2
 
43d5156
2df5db2
 
 
2a1fb64
6619093
2df5db2
ef7fd16
2df5db2
 
43d5156
 
 
 
 
 
 
 
 
 
38e0d26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87cff19
 
 
 
 
38e0d26
 
 
 
 
 
 
 
 
 
 
 
 
e8d6884
38e0d26
 
 
43d5156
38e0d26
 
 
 
 
 
 
43d5156
 
ccb3d0c
 
a8bb2a1
43d5156
43a3251
3e1155d
85ec9ea
9266d8b
43a3251
3e1155d
 
 
 
 
 
8712980
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: cc-by-4.0
datasets:
- amphion/Emilia-Dataset
language:
- fr
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- french
- audio
- speech
- tts
- fine-tuning
- chatterbox
- Emilia
- voice-cloning
- zero-shot
---

# Chatterbox TTS French 🥖

**Chatterbox TTS French** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.

<div align="center"><img width="400px" src="https://ih1.redbubble.net/image.5397735048.6235/bg,f8f8f8-flat,750x,075,f-pad,750x1000,f8f8f8.jpg" alt="baguette-france-tour-eiffel-image" /></div>

- 🔊 **Language**: French 🇫🇷  
- 🗣️ **Training dataset**: [Emilia Dataset (FR branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset)  
- ⏱️ **Data quantity**: 1400 hours of audio  

## Usage Example

Here’s how to generate speech using Chatterbox-TTS French:

```python
import torch
import soundfile as sf
from chatterbox.tts import ChatterboxTTS
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Configuration
MODEL_REPO = "Thomcles/Chatterbox-TTS-French"
CHECKPOINT_FILENAME = "t3_cfg.safetensors"
OUTPUT_PATH = "output_cloned_voice.wav"
TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)."

def get_device() -> str:
    return "cuda" if torch.cuda.is_available() else "cpu"

def download_checkpoint(repo: str, filename: str) -> str:
    return hf_hub_download(repo_id=repo, filename=filename)

def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS:
    model = ChatterboxTTS.from_pretrained(device=device)
    checkpoint_path = download_checkpoint(repo, checkpoint_file)
    t3_state = load_file(checkpoint_path, device="cpu")
    model.t3.load_state_dict(t3_state)
    return model

def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
    with torch.inference_mode():
        return model.generate(
            text=text, 
            audio_prompt_path=audio_prompt_path, 
            **kwargs
        )

def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
    sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)

def main():
    print("Loading model...")
    device = get_device()
    model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)

    print(f"Generating speech on {device}...")
    wav = synthesize_speech(
        model,
        TEXT_TO_SYNTHESIZE,
        audio_prompt_path=None,
        exaggeration=0.5,
        temperature=0.6,
        cfg_weight=0.3
    )

    print(f"Saving output to: {OUTPUT_PATH}")
    save_audio(wav, OUTPUT_PATH, model.sr)
    print("Done.")

if __name__ == "__main__":
    main()
```

Here is the output:

<audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-French/resolve/main/example.mp3">Your browser does not support audio.</audio>

### Base model license

The base model is licensed under the MIT License.  
Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox)  
License: [MIT](https://choosealicense.com/licenses/mit/)  

### Training Data License

This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).  
Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset)  
License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/)  


### Contact me

Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out.