|
--- |
|
license: mit |
|
base_model: |
|
- openai/whisper-large-v3-turbo |
|
tags: |
|
- whisper |
|
- faster |
|
- int8 |
|
- ct2 |
|
- turbo |
|
--- |
|
# Whisper Large v3 Turbo - CTranslate2 |
|
|
|
This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR). |
|
|
|
## Model Description |
|
|
|
This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing: |
|
|
|
- **Faster inference**: Optimized implementations of attention mechanisms and feed-forward networks |
|
- **Lower memory usage**: Quantization support and memory-efficient attention |
|
- **Better throughput**: Batching and parallel processing optimizations |
|
- **Cross-platform compatibility**: Support for CPU and GPU inference |
|
|
|
## Conversion |
|
|
|
This model has been converted using the following command: |
|
|
|
```bash |
|
ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json |
|
``` |
|
|
|
The conversion includes **int8 quantization**, which provides several benefits: |
|
|
|
- **Reduced disk space**: Significantly smaller model size compared to the original float32 version |
|
- **Lower memory consumption**: Requires less RAM during inference |
|
- **Maintained accuracy**: Minimal quality loss while providing substantial efficiency gains |
|
- **Faster loading**: Reduced time to load the model from disk |
|
|
|
## Original Model |
|
|
|
This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that: |
|
|
|
- Supports 99 languages |
|
- Provides high-quality transcription and translation |
|
- Features improved accuracy and speed compared to previous Whisper versions |
|
- Handles various audio conditions and accents |
|
|
|
## Usage |
|
|
|
To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper): |
|
|
|
```bash |
|
pip install ctranslate2 faster-whisper |
|
``` |
|
|
|
```python |
|
from faster_whisper import WhisperModel |
|
|
|
model_size = "path/to/whisper-large-v3-turbo-ct2" |
|
model = WhisperModel(model_size, device="cpu", compute_type="int8") |
|
|
|
segments, info = model.transcribe("audio.wav", beam_size=5) |
|
|
|
for segment in segments: |
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
``` |
|
|
|
## Performance |
|
|
|
This CTranslate2 version provides significant performance improvements over the original PyTorch implementation: |
|
|
|
- Up to 4x faster inference |
|
- Reduced memory consumption |
|
- Support for quantization |
|
- Optimized for both CPU and GPU inference |
|
|
|
## Supported Languages |
|
|
|
Same as the original Whisper Large v3 Turbo: |
|
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh. |
|
|
|
## Model Card |
|
|
|
- **Developed by**: OpenAI (original), converted to CT2 format |
|
- **Model type**: Automatic Speech Recognition |
|
- **Language(s)**: Multilingual (99 languages) |
|
- **License**: MIT |
|
- **Model size**: Large (1550M parameters) |