Model Card for Multi‑Label Emotion Classification on Reddit Comments

This repository contains training and inference code for multi‑label emotion classification of Reddit comments using the GoEmotions dataset (27 emotions + neutral) with a RoBERTa‑base encoder. It includes a configuration‑driven training script, evaluation, decision‑threshold tuning, and a lightweight inference entrypoint.

Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments

Model Details

Model Description

This project fine‑tunes a Transformer encoder for multi‑label emotion detection on Reddit comments. The default configuration uses roberta-base, binary cross‑entropy loss (optionally focal loss), and grid‑search threshold tuning on the validation set.

Developed by: GitHub @amirhossein-yousefi
Model type: Multi‑label text classification (Transformer encoder)
Language(s) (NLP): English
License: No explicit license file was found in the repository; treat as “all rights reserved” unless the author adds a license.
Finetuned from model : roberta-base

Model Sources

Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments
Paper [dataset]: GoEmotions: A Dataset of Fine‑Grained Emotions (Demszky et al., 2020)

Uses

Direct Use

Tagging short English texts (e.g., social posts, comments) with multiple emotions from the GoEmotions taxonomy (e.g., joy, sadness, anger, admiration, gratitude, etc.).
Exploratory analytics and visualization of emotion distributions in corpora similar to Reddit.

Downstream Use

Fine‑tuning or domain adaptation to platforms beyond Reddit (forums, support tickets, app reviews).
Serving as a baseline component in moderation pipelines or empathetic response systems (with careful human oversight).

Out‑of‑Scope Use

Medical, psychological, or diagnostic use; mental‑health inference.
High‑stakes decisions (employment, lending, safety) without rigorous, domain‑specific validation.
Non‑English or heavily code‑switched text without additional training/testing.

Bias, Risks, and Limitations

Dataset origin: GoEmotions is built from Reddit comments; models may inherit Reddit‑specific discourse, slang, and toxicity patterns and may underperform on other domains.
Annotation noise: Third‑party analyses have raised concerns about mislabels in GoEmotions; treat labels as imperfect and consider human review for critical use cases.
Multi‑label uncertainty: Threshold choice materially affects precision/recall trade‑offs. The repo tunes the threshold on validation data; you should recalibrate for your domain.

Recommendations

Calibrate thresholds on in‑domain validation data (the repo grid‑searches 0.05–0.95).
Report per‑label metrics, especially for minority emotions.
Consider bias audits and human‑in‑the‑loop review before deployment.

How to Get Started with the Model

Environment

Python ≥ 3.13
Install dependencies:
```
pip install -r requirements.txt
```

Train

The Makefile provides a default train target:

python -m emoclass.train --config configs/base.yaml

Inference

After training (or pointing to a trained directory), run:

python -m emoclass.inference --model_dir outputs/goemotions_roberta --text "I love this!" "This is awful."

Training Details

Training Data

Dataset: GoEmotions (27 emotions + neutral). The default config uses the simplified variant.
Text column: text
Labels column: labels
Max sequence length: 192

Training Procedure

Preprocessing

Standard Transformer tokenization for roberta-base.
Multi‑hot label encoding for emotions.

Training Hyperparameters

Base model: roberta-base
Batch size: 16 (train), 32 (eval)
Learning rate: 2e‑5
Epochs: 5
Weight decay: 0.01
Warmup ratio: 0.06
Gradient accumulation: 1
Precision: bf16/fp16 if available
Loss: Binary Cross‑Entropy (optionally focal loss with γ=2.0, α=0.25)
Threshold tuning: grid 0.05 → 0.95 (step 0.01); best val micro‑F1 ≈ 0.84
LoRA/PEFT: available in config (default off)

Speeds, Sizes, Times

See results.txt for an example run’s timing & throughput logs.

Evaluation

Testing Data, Factors & Metrics

Test split: GoEmotions simplified test.
Metrics: micro/macro/sample F1, micro/macro Average Precision (AP), micro/macro ROC‑AUC.

Results (example run)

Threshold (val‑tuned): 0.84
F1 (micro): 0.5284
F1 (macro): 0.4995
F1 (samples): 0.5301
AP (micro): 0.5352
AP (macro): 0.5087
ROC‑AUC (micro): 0.9517
ROC‑AUC (macro): 0.9310

(See results.txt for the full log and any updates.)

Model Examination

Inspect per‑label thresholds and confusion patterns; minority emotions (e.g., grief, pride, nervousness) often suffer lower F1 and need more tuning or class‑balancing strategies.

Environmental Impact

Not measured. If desired, log GPU type, hours, region, and estimate emissions using the ML CO2 calculator.

Technical Specifications

Model Architecture and Objective

Transformer encoder (roberta-base) fine‑tuned with a sigmoid multi‑label head and BCE (or focal) loss.

Compute Infrastructure

Frameworks: transformers, datasets, accelerate, evaluate, scikit-learn, optional peft.
Hardware/software specifics are user‑dependent.

Citation

GoEmotions (dataset/paper):
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A Dataset of Fine‑Grained Emotions. ACL 2020. https://arxiv.org/abs/2005.00547

BibTeX:

@inproceedings{demszky2020goemotions,
  title={GoEmotions: A Dataset of Fine-Grained Emotions},
  author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}

Glossary

AP: Average Precision (area under precision–recall curve).
AUC: Area under ROC curve.
Micro/Macro F1: Micro aggregates over all labels; macro averages per‑label F1.

More Information

The configuration file at configs/base.yaml documents tweakable knobs (loss type, LoRA, precision, etc.).
Artifacts are saved under outputs/ by default.

Model Card Authors

Original code: @amirhossein-yousefi
Model card: generated programmatically for documentation purposes.

Model Card Contact

Open an issue in the GitHub repository.

Downloads last month: 13

Safetensors

Model size

125M params

Tensor type

F32

Model tree for Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta

Base model

FacebookAI/roberta-base

Finetuned

(1877)

this model

Dataset used to train Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta

Evaluation results

F1 (micro) on GoEmotions
test set self-reported

0.528
F1 (macro) on GoEmotions
test set self-reported

0.500
F1 (samples) on GoEmotions
test set self-reported

0.530
Average Precision (micro) on GoEmotions
test set self-reported

0.535
Average Precision (macro) on GoEmotions
test set self-reported

0.509
ROC AUC (micro) on GoEmotions
test set self-reported

0.952
ROC AUC (macro) on GoEmotions
test set self-reported

0.931

View on Papers With Code