File size: 2,253 Bytes
596094f f5b8a4e 596094f 8dda712 f5b8a4e 61223d5 f5b8a4e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
license: cc-by-sa-4.0
language:
- hr
- bs
- sr
---
# XLM-R-BERTić
This model was produced by pre-training [XLM-Roberta-large](https://huggingface.co/xlm-roberta-large) 48k steps on South Slavic languages.
# Benchmarking
Three tasks were chosen for model evaluation:
* Named Entity Recognition (NER)
* Sentiment regression
* COPA (Choice of plausible alternatives)
In all cases, this model was finetuned for specific downstream tasks.
## NER
(entry to be added soon)
## Sentiment regression
[ParlaSent dataset](https://huggingface.co/datasets/classla/ParlaSent) was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages.
The procedure is explained in greater detail in the dedicated [benchmarking repository](https://github.com/clarinsi/benchich/tree/main/sentiment).
| system | train | test | r^2 |
|:-----------------------------------------------------------------------|:--------------------|:-------------------------|------:|
| [xlm-r-parlasent](https://huggingface.co/classla/xlm-r-parlasent) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 |
| [BERTić](https://huggingface.co/classla/bcms-bertic) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 |
| XLM-R-SloBERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 |
| XLM-Roberta-Large | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 |
| **XLM-R-BERTić** | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 |
| [crosloengual-bert](https://huggingface.co/EMBEDDIA/crosloengual-bert) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 |
| XLM-Roberta-Base | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 |
| dummy (mean) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 |
## COPA
(to be added soon)
# Citation
(to be added soon)
# Authors
* [Nikola Ljubešič](https://huggingface.co/nljubesi) |