File size: 2,253 Bytes
596094f
f5b8a4e
 
 
 
 
596094f
8dda712
f5b8a4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61223d5
f5b8a4e
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: cc-by-sa-4.0
language:
- hr
- bs
- sr
---
# XLM-R-BERTić

This model was produced by pre-training [XLM-Roberta-large](https://huggingface.co/xlm-roberta-large) 48k steps on South Slavic languages.

# Benchmarking
Three tasks were chosen for model evaluation:
* Named Entity Recognition (NER)
* Sentiment regression
* COPA (Choice of plausible alternatives)

  
In all cases, this model was finetuned for specific downstream tasks.
## NER
(entry to be added soon)
## Sentiment regression

[ParlaSent dataset](https://huggingface.co/datasets/classla/ParlaSent) was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages. 
The procedure is explained in greater detail in the dedicated [benchmarking repository](https://github.com/clarinsi/benchich/tree/main/sentiment).

| system                                                                 | train               | test                     |   r^2 |
|:-----------------------------------------------------------------------|:--------------------|:-------------------------|------:|
| [xlm-r-parlasent](https://huggingface.co/classla/xlm-r-parlasent)      | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 |
| [BERTić](https://huggingface.co/classla/bcms-bertic)                   | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 |
|   XLM-R-SloBERTić                                                      | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 |
| XLM-Roberta-Large                                                      | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 |
| **XLM-R-BERTić**                                                     | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 |
| [crosloengual-bert](https://huggingface.co/EMBEDDIA/crosloengual-bert) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 |
| XLM-Roberta-Base                                                       | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 |
| dummy (mean)                                                           | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 |
## COPA
(to be added soon)

# Citation
(to be added soon)
# Authors
* [Nikola Ljubešič](https://huggingface.co/nljubesi)