File size: 3,846 Bytes
fedeb82 837594d 06ee6bc 126a775 2df5db2 43d5156 2df5db2 2a1fb64 6619093 2df5db2 ef7fd16 2df5db2 43d5156 38e0d26 87cff19 38e0d26 e8d6884 38e0d26 43d5156 38e0d26 43d5156 ccb3d0c a8bb2a1 43d5156 43a3251 3e1155d 85ec9ea 9266d8b 43a3251 3e1155d 8712980 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
license: cc-by-4.0
datasets:
- amphion/Emilia-Dataset
language:
- fr
base_model:
- ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
- french
- audio
- speech
- tts
- fine-tuning
- chatterbox
- Emilia
- voice-cloning
- zero-shot
---
# Chatterbox TTS French 🥖
**Chatterbox TTS French** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.
<div align="center"><img width="400px" src="https://ih1.redbubble.net/image.5397735048.6235/bg,f8f8f8-flat,750x,075,f-pad,750x1000,f8f8f8.jpg" alt="baguette-france-tour-eiffel-image" /></div>
- 🔊 **Language**: French 🇫🇷
- 🗣️ **Training dataset**: [Emilia Dataset (FR branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset)
- ⏱️ **Data quantity**: 1400 hours of audio
## Usage Example
Here’s how to generate speech using Chatterbox-TTS French:
```python
import torch
import soundfile as sf
from chatterbox.tts import ChatterboxTTS
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Configuration
MODEL_REPO = "Thomcles/Chatterbox-TTS-French"
CHECKPOINT_FILENAME = "t3_cfg.safetensors"
OUTPUT_PATH = "output_cloned_voice.wav"
TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)."
def get_device() -> str:
return "cuda" if torch.cuda.is_available() else "cpu"
def download_checkpoint(repo: str, filename: str) -> str:
return hf_hub_download(repo_id=repo, filename=filename)
def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS:
model = ChatterboxTTS.from_pretrained(device=device)
checkpoint_path = download_checkpoint(repo, checkpoint_file)
t3_state = load_file(checkpoint_path, device="cpu")
model.t3.load_state_dict(t3_state)
return model
def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
with torch.inference_mode():
return model.generate(
text=text,
audio_prompt_path=audio_prompt_path,
**kwargs
)
def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)
def main():
print("Loading model...")
device = get_device()
model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)
print(f"Generating speech on {device}...")
wav = synthesize_speech(
model,
TEXT_TO_SYNTHESIZE,
audio_prompt_path=None,
exaggeration=0.5,
temperature=0.6,
cfg_weight=0.3
)
print(f"Saving output to: {OUTPUT_PATH}")
save_audio(wav, OUTPUT_PATH, model.sr)
print("Done.")
if __name__ == "__main__":
main()
```
Here is the output:
<audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-French/resolve/main/example.mp3">Your browser does not support audio.</audio>
### Base model license
The base model is licensed under the MIT License.
Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox)
License: [MIT](https://choosealicense.com/licenses/mit/)
### Training Data License
This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset)
License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/)
### Contact me
Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out.
|