Model Card for Multi‑Label Emotion Classification on Reddit Comments
This repository contains training and inference code for multi‑label emotion classification of Reddit comments using the GoEmotions dataset (27 emotions + neutral) with a RoBERTa‑base encoder. It includes a configuration‑driven training script, evaluation, decision‑threshold tuning, and a lightweight inference entrypoint.
Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments
Model Details
Model Description
This project fine‑tunes a Transformer encoder for multi‑label emotion detection on Reddit comments. The default configuration uses roberta-base
, binary cross‑entropy loss (optionally focal loss), and grid‑search threshold tuning on the validation set.
- Developed by: GitHub @amirhossein-yousefi
- Model type: Multi‑label text classification (Transformer encoder)
- Language(s) (NLP): English
- License: No explicit license file was found in the repository; treat as “all rights reserved” unless the author adds a license.
- Finetuned from model :
roberta-base
Model Sources
- Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments
- Paper [dataset]: GoEmotions: A Dataset of Fine‑Grained Emotions (Demszky et al., 2020)
Uses
Direct Use
- Tagging short English texts (e.g., social posts, comments) with multiple emotions from the GoEmotions taxonomy (e.g., joy, sadness, anger, admiration, gratitude, etc.).
- Exploratory analytics and visualization of emotion distributions in corpora similar to Reddit.
Downstream Use
- Fine‑tuning or domain adaptation to platforms beyond Reddit (forums, support tickets, app reviews).
- Serving as a baseline component in moderation pipelines or empathetic response systems (with careful human oversight).
Out‑of‑Scope Use
- Medical, psychological, or diagnostic use; mental‑health inference.
- High‑stakes decisions (employment, lending, safety) without rigorous, domain‑specific validation.
- Non‑English or heavily code‑switched text without additional training/testing.
Bias, Risks, and Limitations
- Dataset origin: GoEmotions is built from Reddit comments; models may inherit Reddit‑specific discourse, slang, and toxicity patterns and may underperform on other domains.
- Annotation noise: Third‑party analyses have raised concerns about mislabels in GoEmotions; treat labels as imperfect and consider human review for critical use cases.
- Multi‑label uncertainty: Threshold choice materially affects precision/recall trade‑offs. The repo tunes the threshold on validation data; you should recalibrate for your domain.
Recommendations
- Calibrate thresholds on in‑domain validation data (the repo grid‑searches 0.05–0.95).
- Report per‑label metrics, especially for minority emotions.
- Consider bias audits and human‑in‑the‑loop review before deployment.
How to Get Started with the Model
Environment
- Python ≥ 3.13
- Install dependencies:
pip install -r requirements.txt
Train
The Makefile provides a default train target:
python -m emoclass.train --config configs/base.yaml
Inference
After training (or pointing to a trained directory), run:
python -m emoclass.inference --model_dir outputs/goemotions_roberta --text "I love this!" "This is awful."
Training Details
Training Data
- Dataset: GoEmotions (27 emotions + neutral). The default config uses the
simplified
variant. - Text column:
text
- Labels column:
labels
- Max sequence length: 192
Training Procedure
Preprocessing
- Standard Transformer tokenization for
roberta-base
. - Multi‑hot label encoding for emotions.
Training Hyperparameters
- Base model:
roberta-base
- Batch size: 16 (train), 32 (eval)
- Learning rate: 2e‑5
- Epochs: 5
- Weight decay: 0.01
- Warmup ratio: 0.06
- Gradient accumulation: 1
- Precision: bf16/fp16 if available
- Loss: Binary Cross‑Entropy (optionally focal loss with γ=2.0, α=0.25)
- Threshold tuning: grid 0.05 → 0.95 (step 0.01); best val micro‑F1 ≈ 0.84
- LoRA/PEFT: available in config (default off)
Speeds, Sizes, Times
- See
results.txt
for an example run’s timing & throughput logs.
Evaluation
Testing Data, Factors & Metrics
- Test split: GoEmotions
simplified
test. - Metrics: micro/macro/sample F1, micro/macro Average Precision (AP), micro/macro ROC‑AUC.
Results (example run)
- Threshold (val‑tuned): 0.84
- F1 (micro): 0.5284
- F1 (macro): 0.4995
- F1 (samples): 0.5301
- AP (micro): 0.5352
- AP (macro): 0.5087
- ROC‑AUC (micro): 0.9517
- ROC‑AUC (macro): 0.9310
(See results.txt
for the full log and any updates.)
Model Examination
- Inspect per‑label thresholds and confusion patterns; minority emotions (e.g., grief, pride, nervousness) often suffer lower F1 and need more tuning or class‑balancing strategies.
Environmental Impact
- Not measured. If desired, log GPU type, hours, region, and estimate emissions using the ML CO2 calculator.
Technical Specifications
Model Architecture and Objective
- Transformer encoder (
roberta-base
) fine‑tuned with a sigmoid multi‑label head and BCE (or focal) loss.
Compute Infrastructure
- Frameworks:
transformers
,datasets
,accelerate
,evaluate
,scikit-learn
, optionalpeft
. - Hardware/software specifics are user‑dependent.
Citation
GoEmotions (dataset/paper):
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A Dataset of Fine‑Grained Emotions. ACL 2020. https://arxiv.org/abs/2005.00547
BibTeX:
@inproceedings{demszky2020goemotions,
title={GoEmotions: A Dataset of Fine-Grained Emotions},
author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
Glossary
- AP: Average Precision (area under precision–recall curve).
- AUC: Area under ROC curve.
- Micro/Macro F1: Micro aggregates over all labels; macro averages per‑label F1.
More Information
- The configuration file at
configs/base.yaml
documents tweakable knobs (loss type, LoRA, precision, etc.). - Artifacts are saved under
outputs/
by default.
Model Card Authors
- Original code: @amirhossein-yousefi
- Model card: generated programmatically for documentation purposes.
Model Card Contact
- Open an issue in the GitHub repository.
- Downloads last month
- 13
Model tree for Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta
Base model
FacebookAI/roberta-baseDataset used to train Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta
Evaluation results
- F1 (micro) on GoEmotionstest set self-reported0.528
- F1 (macro) on GoEmotionstest set self-reported0.500
- F1 (samples) on GoEmotionstest set self-reported0.530
- Average Precision (micro) on GoEmotionstest set self-reported0.535
- Average Precision (macro) on GoEmotionstest set self-reported0.509
- ROC AUC (micro) on GoEmotionstest set self-reported0.952
- ROC AUC (macro) on GoEmotionstest set self-reported0.931