File size: 4,699 Bytes

4462d9a

---
language: en
library_name: transformers
pipeline_tag: text-classification
license: mit
tags:
  - sentiment-analysis
  - distilbert
  - sequence-classification
  - academic-peer-review
  - openreview
---

# Academic Sentiment Classifier (DistilBERT)

DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content.

## Model details

- Architecture: DistilBERT for Sequence Classification (2 labels)
- Max input length used during training: 512 tokens
- Labels:
  - LABEL_0 -> negative
  - LABEL_1 -> positive
- Format: `safetensors`

## Intended uses & limitations

Intended uses:

- Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse.
- Evaluate the effect of attacks (e.g., positive/negative steering) on generated reviews by measuring polarity shifts.

Limitations:

- Binary polarity only (no neutral class); confidence scores should be interpreted with care.
- Domain-specific: optimized for academic review-style English text; may underperform on general-domain data.
- Not a replacement for human judgement or editorial decision-making.

Ethical considerations and bias:

- Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness.
- Potential biases may reflect those present in the underlying corpus.

## Training data

The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans.

Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses.

## Training procedure (high level)

- Base model: DistilBERT (transformers)
- Objective: single-label binary classification
- Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens
- Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule)

Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens.

## How to use

Basic pipeline usage:

```python
from transformers import pipeline

clf = pipeline(
    task="text-classification",
    model="YOUR_USERNAME/academic-sentiment-classifier",
    tokenizer="YOUR_USERNAME/academic-sentiment-classifier",
    return_all_scores=False,
)

text = "The paper is clearly written and provides strong empirical support for the claims."
print(clf(text))
# Example output: [{'label': 'LABEL_1', 'score': 0.97}]  # LABEL_1 -> positive
```

If you prefer friendly labels, you can map them:

```python
from transformers import pipeline

id2name = {"LABEL_0": "negative", "LABEL_1": "positive"}
clf = pipeline("text-classification", model="YOUR_USERNAME/academic-sentiment-classifier")
res = clf("This section lacks clarity and the experiments are inconclusive.")[0]
res["label"] = id2name.get(res["label"], res["label"])  # map to human-friendly label
print(res)
```

Batch inference:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = 0 if torch.cuda.is_available() else -1
tok = AutoTokenizer.from_pretrained("YOUR_USERNAME/academic-sentiment-classifier")
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/academic-sentiment-classifier")

texts = [
    "I recommend acceptance; the methodology is solid and results are convincing.",
    "Major concerns remain; the evaluation is incomplete and unclear.",
]

inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred_ids = probs.argmax(dim=-1)

# Map to friendly labels
id2name = {0: "negative", 1: "positive"}
preds = [id2name[i.item()] for i in pred_ids]
print(list(zip(texts, preds)))
```

## Evaluation

If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card.

## License

The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data.

## Citation

If you use this model, please cite the project:

```bibtex
@software{academic_sentiment_classifier,
  title        = {Academic Sentiment Classifier (DistilBERT)},
  year         = {2025},
  url          = {https://huggingface.co/EvilScript/academic-sentiment-classifier}
}
```