--- license: apache-2.0 language: - zh - en --- # WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation Longhao Li¹, Zhao Guo¹, Hongjie Chen², Yuhang Dai¹, Ziyu Zhang¹, Hongfei Xue¹, Tianlun Zuo¹, Chengyou Wang¹, Shuiyuan Wang¹, Xin Xu³, Hui Bu³, Jie Li², Jian Kang², Binbin Zhang⁴, Ruibin Yuan⁵, Ziya Zhou⁵, Wei Xue⁵, Lei Xie¹ ¹ ASLP, Northwest Polytechnical University ² Institute of Artificial Intelligence (TeleAI), China Telecom ³ Beijing AISHELL Technology Co., Ltd. ⁴ WeNet Open Source Community ⁵ Hong Kong University of Science and Technology

📑 Paper | 🐙 GitHub | 🤗 HuggingFace
🖥️ HuggingFace Space | 🎤 Demo Page | 💬 Contact Us

## ASR Leaderboard

Model	#Params (M)	In-House		Open-Source					WSYue-eval
Model	#Params (M)	Dialogue	Reading	yue	HK	MDCC	Daily_Use	Commands	Short	Long
w/o LLM
Conformer-Yue⭐	130	16.57	7.82	7.72	11.42	5.73	5.73	8.97	5.05	8.89
Paraformer	220	83.22	51.97	70.16	68.49	47.67	79.31	69.32	73.64	89.00
SenseVoice-small	234	21.08	6.52	8.05	7.34	6.34	5.74	6.65	6.69	9.95
SenseVoice-s-Yue⭐	234	19.19	6.71	6.87	8.68	5.43	5.24	6.93	5.23	8.63
Dolphin-small	372	59.20	7.38	39.69	51.29	26.39	7.21	9.68	32.32	58.20
TeleASR	700	37.18	7.27	7.02	7.88	6.25	8.02	5.98	6.23	11.33
Whisper-medium	769	75.50	68.69	59.44	62.50	62.31	64.41	80.41	80.82	50.96
Whisper-m-Yue⭐	769	18.69	6.86	6.86	11.03	5.49	4.70	8.51	5.05	8.05
FireRedASR-AED-L	1100	73.70	18.72	43.93	43.33	34.53	48.05	49.99	55.37	50.26
Whisper-large-v3	1550	45.09	15.46	12.85	16.36	14.63	17.84	20.70	12.95	26.86
w/ LLM
Qwen2.5-Omni-3B	3000	72.01	7.49	12.59	11.75	38.91	10.59	25.78	67.95	88.46
Kimi-Audio	7000	68.65	24.34	40.90	38.72	30.72	44.29	45.54	50.86	33.49
FireRedASR-LLM-L	8300	73.70	18.72	43.93	43.33	34.53	48.05	49.99	49.87	45.92
Conformer-LLM-Yue⭐	4200	17.22	6.21	6.23	9.52	4.35	4.57	6.98	4.73	7.91

## ASR Inference ### U2pp_Conformer_Yue ``` dir=u2pp_conformer_yue decode_checkpoint=$dir/u2pp_conformer_yue.pt test_set=path/to/test_set test_result_dir=path/to/test_result_dir python wenet/bin/recognize.py \ --gpu 0 \ --modes attention_rescoring \ --config $dir/train.yaml \ --test_data $test_set/data.list \ --checkpoint $decode_checkpoint \ --beam_size 10 \ --batch_size 32 \ --ctc_weight 0.5 \ --result_dir $test_result_dir \ --decoding_chunk_size -1 ``` ### Whisper_Medium_Yue ``` dir=whisper_medium_yue decode_checkpoint=$dir/whisper_medium_yue.pt test_set=path/to/test_set test_result_dir=path/to/test_result_dir python wenet/bin/recognize.py \ --gpu 0 \ --modes attention \ --config $dir/train.yaml \ --test_data $test_set/data.list \ --checkpoint $decode_checkpoint \ --beam_size 10 \ --batch_size 32 \ --blank_penalty 0.0 \ --ctc_weight 0.0 \ --reverse_weight 0.0 \ --result_dir $test_result_dir \ --decoding_chunk_size -1 ```