Kernicterus
/

whisper-large-v3-turbo-ct2-int8

Model card Files Files and versions

whisper-large-v3-turbo-ct2-int8 / README.md

Kernicterus's picture

Update README.md

13adbbd verified 3 months ago

|

history blame contribute delete

3.46 kB

	---
	license: mit
	base_model:
	- openai/whisper-large-v3-turbo
	tags:
	- whisper
	- faster
	- int8
	- ct2
	- turbo
	---
	# Whisper Large v3 Turbo - CTranslate2

	This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR).

	## Model Description

	This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing:

	- Faster inference: Optimized implementations of attention mechanisms and feed-forward networks
	- Lower memory usage: Quantization support and memory-efficient attention
	- Better throughput: Batching and parallel processing optimizations
	- Cross-platform compatibility: Support for CPU and GPU inference

	## Conversion

	This model has been converted using the following command:

	```bash
	ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json
	```

	The conversion includes int8 quantization, which provides several benefits:

	- Reduced disk space: Significantly smaller model size compared to the original float32 version
	- Lower memory consumption: Requires less RAM during inference
	- Maintained accuracy: Minimal quality loss while providing substantial efficiency gains
	- Faster loading: Reduced time to load the model from disk

	## Original Model

	This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that:

	- Supports 99 languages
	- Provides high-quality transcription and translation
	- Features improved accuracy and speed compared to previous Whisper versions
	- Handles various audio conditions and accents

	## Usage

	To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper):

	```bash
	pip install ctranslate2 faster-whisper
	```

	```python
	from faster_whisper import WhisperModel

	model_size = "path/to/whisper-large-v3-turbo-ct2"
	model = WhisperModel(model_size, device="cpu", compute_type="int8")

	segments, info = model.transcribe("audio.wav", beam_size=5)

	for segment in segments:
	print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
	```

	## Performance

	This CTranslate2 version provides significant performance improvements over the original PyTorch implementation:

	- Up to 4x faster inference
	- Reduced memory consumption
	- Support for quantization
	- Optimized for both CPU and GPU inference

	## Supported Languages

	Same as the original Whisper Large v3 Turbo:
	Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.

	## Model Card

	- Developed by: OpenAI (original), converted to CT2 format
	- Model type: Automatic Speech Recognition
	- Language(s): Multilingual (99 languages)
	- License: MIT
	- Model size: Large (1550M parameters)