File size: 4,699 Bytes
4462d9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
language: en
library_name: transformers
pipeline_tag: text-classification
license: mit
tags:
- sentiment-analysis
- distilbert
- sequence-classification
- academic-peer-review
- openreview
---
# Academic Sentiment Classifier (DistilBERT)
DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content.
## Model details
- Architecture: DistilBERT for Sequence Classification (2 labels)
- Max input length used during training: 512 tokens
- Labels:
- LABEL_0 -> negative
- LABEL_1 -> positive
- Format: `safetensors`
## Intended uses & limitations
Intended uses:
- Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse.
- Evaluate the effect of attacks (e.g., positive/negative steering) on generated reviews by measuring polarity shifts.
Limitations:
- Binary polarity only (no neutral class); confidence scores should be interpreted with care.
- Domain-specific: optimized for academic review-style English text; may underperform on general-domain data.
- Not a replacement for human judgement or editorial decision-making.
Ethical considerations and bias:
- Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness.
- Potential biases may reflect those present in the underlying corpus.
## Training data
The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans.
Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses.
## Training procedure (high level)
- Base model: DistilBERT (transformers)
- Objective: single-label binary classification
- Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens
- Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule)
Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens.
## How to use
Basic pipeline usage:
```python
from transformers import pipeline
clf = pipeline(
task="text-classification",
model="YOUR_USERNAME/academic-sentiment-classifier",
tokenizer="YOUR_USERNAME/academic-sentiment-classifier",
return_all_scores=False,
)
text = "The paper is clearly written and provides strong empirical support for the claims."
print(clf(text))
# Example output: [{'label': 'LABEL_1', 'score': 0.97}] # LABEL_1 -> positive
```
If you prefer friendly labels, you can map them:
```python
from transformers import pipeline
id2name = {"LABEL_0": "negative", "LABEL_1": "positive"}
clf = pipeline("text-classification", model="YOUR_USERNAME/academic-sentiment-classifier")
res = clf("This section lacks clarity and the experiments are inconclusive.")[0]
res["label"] = id2name.get(res["label"], res["label"]) # map to human-friendly label
print(res)
```
Batch inference:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = 0 if torch.cuda.is_available() else -1
tok = AutoTokenizer.from_pretrained("YOUR_USERNAME/academic-sentiment-classifier")
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/academic-sentiment-classifier")
texts = [
"I recommend acceptance; the methodology is solid and results are convincing.",
"Major concerns remain; the evaluation is incomplete and unclear.",
]
inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred_ids = probs.argmax(dim=-1)
# Map to friendly labels
id2name = {0: "negative", 1: "positive"}
preds = [id2name[i.item()] for i in pred_ids]
print(list(zip(texts, preds)))
```
## Evaluation
If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card.
## License
The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data.
## Citation
If you use this model, please cite the project:
```bibtex
@software{academic_sentiment_classifier,
title = {Academic Sentiment Classifier (DistilBERT)},
year = {2025},
url = {https://huggingface.co/EvilScript/academic-sentiment-classifier}
}
```
|