espnet
/

owsm_v4_medium_1B

Automatic Speech Recognition

speech-translation

language-identification

Model card Files Files and versions

pyf98 commited on 22 days ago

Commit

ad37ccd

·

verified ·

1 Parent(s): 471418d

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -13,9 +13,12 @@ library_name: espnet
 pipeline_tag: automatic-speech-recognition
 ---
-## Open Whisper-style Speech Model (OWSM)
-OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
 Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
 The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
@@ -24,9 +27,9 @@ The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
-This repo contains a base-sized model with 102M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
 It is trained on 320k hours of public speech data.
-The newly curated data will be publicly released. Please stay tuned!
 It supports the following speech-to-text tasks:
 - Language identification

 pipeline_tag: automatic-speech-recognition
 ---
+🏆 **News:** Our [OWSM v4 paper](https://www.isca-archive.org/interspeech_2025/peng25c_interspeech.html) won the [Best Student Paper Award](https://isca-speech.org/ISCA-Awards) at INTERSPEECH 2025!
+[Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/) is the first **fully open** Whisper-style speech foundation model.
+It reproduces and advances OpenAI's Whisper-style training using publicly available data and open-source toolkits.
+The code, pre-trained model weights, and training logs are publicly released to promote open science in speech foundation models.
 Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
 The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
+This repo contains a medium-sized model with 1B parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
 It is trained on 320k hours of public speech data.
+The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4
 It supports the following speech-to-text tasks:
 - Language identification