--- license: apache-2.0 language: - zh - en --- # WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation Longhao Li¹, Zhao Guo¹, Hongjie Chen², Yuhang Dai¹, Ziyu Zhang¹, Hongfei Xue¹, Tianlun Zuo¹, Chengyou Wang¹, Shuiyuan Wang¹, Xin Xu³, Hui Bu³, Jie Li², Jian Kang², Binbin Zhang⁴, Ruibin Yuan⁵, Ziya Zhou⁵, Wei Xue⁵, Lei Xie¹ ¹ ASLP, Northwest Polytechnical University ² Institute of Artificial Intelligence (TeleAI), China Telecom ³ Beijing AISHELL Technology Co., Ltd. ⁴ WeNet Open Source Community ⁵ Hong Kong University of Science and Technology

📑 Paper    |    🐙 GitHub    |    🤗 HuggingFace
🖥️ HuggingFace Space    |    🎤 Demo Page    |    💬 Contact Us

## ASR Leaderboard
Model #Params (M) In-House Open-Source WSYue-eval
Dialogue Reading yue HK MDCC Daily_Use Commands Short Long
w/o LLM
Conformer-Yue⭐13016.577.827.7211.425.735.738.975.058.89
Paraformer22083.2251.9770.1668.4947.6779.3169.3273.6489.00
SenseVoice-small23421.086.528.057.346.345.746.656.699.95
SenseVoice-s-Yue⭐23419.196.716.878.685.435.246.935.238.63
Dolphin-small37259.207.3839.6951.2926.397.219.6832.3258.20
TeleASR70037.187.277.027.886.258.025.986.2311.33
Whisper-medium76975.5068.6959.4462.5062.3164.4180.4180.8250.96
Whisper-m-Yue⭐76918.696.866.8611.035.494.708.515.058.05
FireRedASR-AED-L110073.7018.7243.9343.3334.5348.0549.9955.3750.26
Whisper-large-v3155045.0915.4612.8516.3614.6317.8420.7012.9526.86
w/ LLM
Qwen2.5-Omni-3B300072.017.4912.5911.7538.9110.5925.7867.9588.46
Kimi-Audio700068.6524.3440.9038.7230.7244.2945.5450.8633.49
FireRedASR-LLM-L830073.7018.7243.9343.3334.5348.0549.9949.8745.92
Conformer-LLM-Yue⭐420017.226.216.239.524.354.576.984.737.91
## ASR Inference ### U2pp_Conformer_Yue ``` dir=u2pp_conformer_yue decode_checkpoint=$dir/u2pp_conformer_yue.pt test_set=path/to/test_set test_result_dir=path/to/test_result_dir python wenet/bin/recognize.py \ --gpu 0 \ --modes attention_rescoring \ --config $dir/train.yaml \ --test_data $test_set/data.list \ --checkpoint $decode_checkpoint \ --beam_size 10 \ --batch_size 32 \ --ctc_weight 0.5 \ --result_dir $test_result_dir \ --decoding_chunk_size -1 ``` ### Whisper_Medium_Yue ``` dir=whisper_medium_yue decode_checkpoint=$dir/whisper_medium_yue.pt test_set=path/to/test_set test_result_dir=path/to/test_result_dir python wenet/bin/recognize.py \ --gpu 0 \ --modes attention \ --config $dir/train.yaml \ --test_data $test_set/data.list \ --checkpoint $decode_checkpoint \ --beam_size 10 \ --batch_size 32 \ --blank_penalty 0.0 \ --ctc_weight 0.0 \ --reverse_weight 0.0 \ --result_dir $test_result_dir \ --decoding_chunk_size -1 ```