TanelAlumae commited on
Commit
5306ef3
·
verified ·
1 Parent(s): 1cdd293

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -1
README.md CHANGED
@@ -6,4 +6,90 @@ base_model:
6
  - openai/whisper-large-v3-turbo
7
  pipeline_tag: automatic-speech-recognition
8
  library_name: transformers
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - openai/whisper-large-v3-turbo
7
  pipeline_tag: automatic-speech-recognition
8
  library_name: transformers
9
+ ---
10
+
11
+
12
+ ## Introduction
13
+
14
+ This model is OpenAI Whisper large-v3-turbo, finetuned on 1400 hours of audio with manually created verbatim transcriptions from the TalTech Estonian Speech Dataset 1.0 (https://cs.taltech.ee/staff/tanel.alumae/data/est-pub-asr-data/).
15
+
16
+ ## Usage
17
+
18
+ It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face 🤗 Transformers. To run the model, first install the Transformers
19
+ library. For this example, we'll also install 🤗 Accelerate to reduce the model loading time:
20
+
21
+ ```bash
22
+ pip install --upgrade pip
23
+ pip install --upgrade transformers accelerate
24
+ ```
25
+
26
+ The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
27
+ class to transcribe audios of arbitrary length:
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
32
+ from datasets import load_dataset
33
+
34
+
35
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
36
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
37
+
38
+ model_id = "TalTechNLP/whisper-large-v3-turbo-et-verbatim"
39
+
40
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
41
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
42
+ )
43
+ model.to(device)
44
+
45
+ processor = AutoProcessor.from_pretrained(model_id)
46
+
47
+ pipe = pipeline(
48
+ "automatic-speech-recognition",
49
+ model=model,
50
+ tokenizer=processor.tokenizer,
51
+ feature_extractor=processor.feature_extractor,
52
+ torch_dtype=torch_dtype,
53
+ device=device,
54
+ )
55
+
56
+ audio = "sample.mp3"
57
+
58
+ result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
59
+ print(result)
60
+ ```
61
+
62
+ There is a also a ct2 verison of the model that can be used with tools that a based on `faster-whisper`, e.g. using the `whisper-ctranslate2` command line program, e.g.:
63
+
64
+ ```
65
+ $ whisper-ctranslate2 --model_directory ct2 --language et --vad_filter True --threads 8 --output_dir demo demo/etteütlus2024.wav
66
+ Detected language 'Estonian' with probability 1.000000
67
+ [00:00.620 --> 00:08.820] Kas pole teps mitte kihvt, et Haridus- ja Teadusministeerium paikneb Tartus Munga tänaval?
68
+ [00:08.820 --> 00:23.420] Seal ülikooli peahoonest mõne kukesammu kaugusel tuleb pedagoogikaalased otsused langetada kevisse raiutud imposantsete kultuuriheeroste märksa pilgu all.
69
+ [00:23.420 --> 00:32.680] Peeter Põllu esimese haridusministri rühikas selg tuletab meelde koolmeistrite määravat osatähtsust ühiskonnas.
70
+ [00:32.680 --> 00:45.140] Ning üksi silmi teineteist jälgivad Kreutzwald ja Kalevipoeg kõrvu Oskar Lutsuliku kaine literaadi pilguga ei lase unustada Eesti vaimuilma alusväärtusi.
71
+ [00:45.140 --> 00:52.640] Vahest peaks valitsusegi Stenbocki majast rahvusülikooli akadeemilisse mõju välja kupattama.
72
+ [00:52.640 --> 01:05.860] Nii oleks võimukandjatel ehk mahti ilmavaate turgutamiseks linnaraamatukogust kübekene tarkust nõutada või Tartu Kunstimuuseumis kultustaieseid nautida.
73
+ [01:05.860 --> 01:17.500] Too piisatorni sarnane majamürakas võib tekitada muidugi äraspidise tunde, et Emajõe ja Ateenas on alalõpmata midagi viltu.
74
+ Transcription results written to 'demo' directory
75
+
76
+ ```
77
+
78
+ ## Citation
79
+
80
+ ```
81
+ @inproceedings{alumae-etal-2023-automatic,
82
+ title = "Automatic Closed Captioning for {E}stonian Live Broadcasts",
83
+ author = {Alum{\"a}e, Tanel and
84
+ Kalda, Joonas and
85
+ Bode, K{\"u}lliki and
86
+ Kaitsa, Martin},
87
+ booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
88
+ month = may,
89
+ year = "2023",
90
+ address = "T{\'o}rshavn, Faroe Islands",
91
+ publisher = "University of Tartu Library",
92
+ url = "https://aclanthology.org/2023.nodalida-1.49",
93
+ pages = "492--499"
94
+ }
95
+ ```