bart-ko-base / README.md
cosmoquester's picture
feat: Add Model
e89f23e unverified
|
raw
history blame
1.28 kB
metadata
language: ko

Pretrained BART in Korean

This is pretrained BART model with multiple Korean Datasets.

I used multiple datasets for generalizing the model for both colloquial and written texts.

The training is supported by TPU Research Cloud program.

The script which is used to pre-train model is here.

When you use the reference API, you must wrap the sentence with [BOS] and [EOS] like below example.

[BOS] 안녕하세요? 반가워요~~ [EOS]

You can also test mask filling performance using [MASK] token like this.

[BOS] [MASK] 먹었어? [EOS]

Used Datasets

모두의 말뭉치

  • 일상 대화 말뭉치 2020
  • 구어 말뭉치
  • 문어 말뭉치
  • 신문 말뭉치

AIhub

세종 말뭉치