pyf98 commited on
Commit
ad37ccd
·
verified ·
1 Parent(s): 471418d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -13,9 +13,12 @@ library_name: espnet
13
  pipeline_tag: automatic-speech-recognition
14
  ---
15
 
16
- ## Open Whisper-style Speech Model (OWSM)
17
 
18
- OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
 
 
 
19
 
20
  Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
21
  The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
@@ -24,9 +27,9 @@ The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
24
  Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
25
  When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
26
 
27
- This repo contains a base-sized model with 102M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
28
  It is trained on 320k hours of public speech data.
29
- The newly curated data will be publicly released. Please stay tuned!
30
 
31
  It supports the following speech-to-text tasks:
32
  - Language identification
 
13
  pipeline_tag: automatic-speech-recognition
14
  ---
15
 
16
+ 🏆 **News:** Our [OWSM v4 paper](https://www.isca-archive.org/interspeech_2025/peng25c_interspeech.html) won the [Best Student Paper Award](https://isca-speech.org/ISCA-Awards) at INTERSPEECH 2025!
17
 
18
+
19
+ [Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/) is the first **fully open** Whisper-style speech foundation model.
20
+ It reproduces and advances OpenAI's Whisper-style training using publicly available data and open-source toolkits.
21
+ The code, pre-trained model weights, and training logs are publicly released to promote open science in speech foundation models.
22
 
23
  Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
24
  The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
 
27
  Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
28
  When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
29
 
30
+ This repo contains a medium-sized model with 1B parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
31
  It is trained on 320k hours of public speech data.
32
+ The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4
33
 
34
  It supports the following speech-to-text tasks:
35
  - Language identification