File size: 1,601 Bytes
9f2ece2 47da2f1 9f2ece2 c63459c 9f2ece2 c63459c 9f2ece2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
language: ["ru"]
tags:
- russian
- classification
- sentiment
- emotion-classification
- multiclass
datasets:
- cedr
widget:
- text: "Бесишь меня, падла"
- text: "Как здорово, что все мы здесь сегодня собрались"
- text: "Как-то стрёмно, давай свалим отсюда?"
- text: "Грусть-тоска меня съедает"
- text: "Данный фрагмент текста не содержит абсолютно никаких эмоций"
- text: "Нифига себе, неужели так тоже бывает!"
---
This is the [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for classification of emotions in Russian sentences. The task is multilabel classification, because one sentence can contain multiple emotions.
The model on the [CEDR dataset](https://huggingface.co/datasets/cedr) described in the paper ["Data-Driven Model for Emotion Detection in Russian Texts"](https://doi.org/10.1016/j.procs.2021.06.075) by Sboev et al.
The model has been trained with Adam optimizer for 40 epochs with learning rate `1e-5` and batch size 64 [in this notebook](https://colab.research.google.com/drive/1AFW70EJaBn7KZKRClDIdDUpbD46cEsat?usp=sharing).
ROC AUC of the predicted probabilities on the test dataset is the following:
| label | no emotion | joy |sadness |surprise| fear |anger | mean |
|-------|------------|--------|--------|--------|--------|--------| --------|
| AUC | 0.9406 | 0.9518 | 0.9372 | 0.8634 | 0.9663 | 0.6761 | 0.8892 | |