metadata
language: ko
Pretrained BART in Korean
This is pretrained BART model with multiple Korean Datasets.
I used multiple datasets for generalizing the model for both colloquial and written texts.
The training is supported by TPU Research Cloud program.
The script which is used to pre-train model is here.
When you use the reference API, you must wrap the sentence with [BOS]
and [EOS]
like below example.
[BOS] 안녕하세요? 반가워요~~ [EOS]
You can also test mask filling performance using [MASK]
token like this.
[BOS] [MASK] 먹었어? [EOS]
Used Datasets
모두의 말뭉치
- 일상 대화 말뭉치 2020
- 구어 말뭉치
- 문어 말뭉치
- 신문 말뭉치