--- language: ["ru"] tags: - russian - classification - sentiment - multiclass datasets: - cedr widget: - text: "Бесишь меня, падла" - text: "Как здорово, что все мы здесь сегодня собрались" - text: "Как-то стрёмно, давай уйдём отсюда?" - text: "Грусть-тоска меня съедает" - text: "Данный фрагмент текста не содержит абсолютно никаких эмоций" - text: "Надо же, неужели так тоже бывает!" --- This is the [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for classification of emotions in Russian sentences. The task is multilabel classification, because one sentence can contain multiple emotions. The model on the [CEDR dataset](https://huggingface.co/datasets/cedr) described in the paper ["Data-Driven Model for Emotion Detection in Russian Texts"](https://doi.org/10.1016/j.procs.2021.06.075) by Sboev et al. The model has been trained with Adam optimizer for 40 epochs with learning rate `1e-5` and batch size 64 [in this notebook](https://colab.research.google.com/drive/1AFW70EJaBn7KZKRClDIdDUpbD46cEsat?usp=sharing). ROC AUC of the predicted probabilities on the test dataset is the following: | label | no emotion | joy |sadness |surprise| fear |anger | mean | |-------|------------|--------|--------|--------|--------|--------| --------| | AUC | 0.9406 | 0.9518 | 0.9372 | 0.8634 | 0.9663 | 0.6761 | 0.8892 |