Spam Message Classifier
A state-of-the-art spam message classification model built on RoBERTa-base transformer architecture, achieving 99.42% accuracy and 0.9782 F1-score for spam class for the test set. Developed as the core spam detection component for Amy, an intelligent Discord moderation bot.
Model Description
This model is a fine-tuned version of FacebookAI/roberta-base for binary spam classification in messaging applications. The classifier accurately distinguishes between legitimate messages (ham) and spam/phishing content, making it production-ready for real-world deployment in messaging platforms and content moderation systems.
- Developed by: roshana1s
- Model type: Binary Sequence Classification
- Language: English
- License: Apache-2.0
- Base Model: FacebookAI/roberta-base
- Primary Use Case: Discord bot moderation and real-time spam detection
Key Features
- 🤖 Transformer-based Architecture: Built on RoBERTa-base for superior text understanding
- ⚡ High Performance: 0.9782 F1-score for spam detection, 99.42% overall accuracy
- 🔧 Hyperparameter Optimization: Automated tuning using Optuna framework (25 trials)
- ⚖️ Class Imbalance Handling: Successfully addressed through weighted loss function
- 🔗 URL Bias Mitigation: Enhanced with real-world ham messages containing links
- 📊 Comprehensive Evaluation: Evaluated on completely unseen test set
Intended Uses
Direct Use
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch
import re
# Device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained(
"roshana1s/spam-message-classifier"
).to(device).eval()
tokenizer = RobertaTokenizer.from_pretrained("roshana1s/spam-message-classifier")
def preprocess_text(text: str) -> str:
# General-purpose normalization: mask URLs, collapse whitespace
text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def get_inference(text: str) -> list:
"""Returns prediction results in [{'label': str, 'score': float}, ...] format."""
# Tokenize input text
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=False,
max_length=128
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# Run inference
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1).squeeze(0)
# Map labels (index 0 = ham, 1 = spam)
result = [
{"label": "ham", "score": float(probs[0])},
{"label": "spam", "score": float(probs[1])}
]
return result
# Example usage
message = "Congratulations! You've won free Discord Nitro. Click here to claim now! <URL>"
text = preprocess_text(message)
result = get_inference(text)
# Extract scores for both classes
spam_score = next((r["score"] for r in result if r["label"] == "spam"), 0.0)
ham_score = next((r["score"] for r in result if r["label"] == "ham"), 0.0)
predicted_label = "Spam" if spam_score > ham_score else "Ham"
confidence = max(spam_score, ham_score)
print(f"Prediction: {predicted_label}")
print(f"Confidence: {confidence:.4f}")
Discord helper (optional) Include this helper when targeting Discord: it normalizes invites, mentions, and custom emoji before tokenization to improve robustness in chat contexts.
def preprocess_text(text: str) -> str:
text = re.sub(r"(https?:\/\/)?(www\.)?(discord\.(gg|io|me|li)|discordapp\.com\/invite)\/\S+", "<DISCORD_INVITE>", text)
text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
text = re.sub(r"<@!?\d+>", "<USER>", text)
text = re.sub(r"<a?:\w+:\d+>", "<EMOJI>", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
Use Cases
This spam classifier is ideal for:
Messaging Platforms
- Discord bot moderation (Primary use case)
- SMS filtering systems
- Chat application content filtering
Out-of-Scope Use
- Non-English language spam detection (trained exclusively on English data)
- Sentiment analysis or other NLP tasks beyond binary spam classification
Training Data
The model was trained on a combination of two comprehensive SMS spam datasets:
- SMS Spam Collection Dataset - UCI Machine Learning Repository
- Discord Text Messages — a manually collected dataset of real Discord messages containing both ham and spam samples. (This dataset was created to mitigate
<URL>bias.)
Preprocessing Steps:
- Label encoding (ham → 0, spam → 1)
- Text cleaning and normalization with Discord-specific preprocessing
- Train/validation/test split (70/15/15)
- Tokenization with RoBERTa tokenizer
- Dynamic padding and truncation
Training Procedure
Hyperparameter Optimization
Automated hyperparameter search using Optuna framework (25 trials):
Search Space:
- Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
- Learning rate: 1e-5 to 5e-5 range
- Weight decay: 0.0 to 0.1 regularization
- Batch size: 8, 16, or 32 samples
- Gradient accumulation steps: 1 to 4
- Training epochs: 2 to 5 epochs
- Warmup ratio: 0.05 to 0.1 for learning rate scheduling
Best Parameters Found (Trial 6/25):
- Hidden dropout: 0.10069482002001506
- Attention dropout: 0.12460257350587067
- Learning rate: 4.976184540342024e-05
- Weight decay: 0.04490021845024478
- Batch size: 16
- Gradient accumulation steps: 4
- Epochs: 4
- Warmup ratio: 0.07622459860163384
Training Strategy
- Data Preprocessing: SMS text cleaning and label encoding
- Tokenization: Dynamic padding with maximum sequence length of 128 tokens
- Class Balancing: Weighted loss function to handle imbalanced dataset
- Hyperparameter Optimization: Optuna-based automated tuning
- Evaluation: Comprehensive metrics on held-out test set
Training Configuration
- Optimizer: AdamW
- Loss Function: Weighted Cross-Entropy (handles class imbalance)
- Label Smoothing: 0.1 (prevents overconfidence)
- Learning Rate Schedule: Linear warmup followed by linear decay
Evaluation Results
Performance Metrics
| Metric | Score |
|---|---|
| Overall Accuracy | 99.41% |
| Weighted F1-Score | 0.9941 |
| Spam F1-Score | 0.9782 |
| Spam Precision | 96.55% |
| Spam Recall | 99.12% |
| Ham Precision | 99.86% |
| Ham Recall | 99.45% |
Confusion Matrix
| Predicted Ham | Predicted Spam | |
|---|---|---|
| Actual Ham | 725 | 4 |
| Actual Spam | 1 | 112 |
Performance Analysis
- True Positives: 112 spam messages correctly identified
- True Negatives: 725 ham messages correctly identified
- False Positives: 4
- False Negatives: 1
Generalizability
📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.
Challenges Addressed & Solutions
✅ URL Bias Mitigation (SUCCESSFULLY ADDRESSED)
Challenge: During initial training, the model became overconfident and labeled almost all messages containing <URL> as spam, even if some were legitimate ham.
Solution: Augmented training data with additional real ham messages containing links collected from Discord servers. This helps the model understand that URLs can appear in non-spam messages and improves generalization for real-world inference, particularly important for Discord bot deployment where legitimate messages often contain links.
✅ Class Imbalance Handling (SUCCESSFULLY ADDRESSED)
Challenge: The combined dataset exhibits natural imbalance.
Solution: Implemented weighted loss function during training to handle the imbalanced dataset effectively, resulting in exceptional performance for both classes.
✅ Overfitting Prevention (SUCCESSFULLY ADDRESSED)
Challenge: Ensuring model generalizes well to unseen data.
Solution: Comprehensive evaluation on completely held-out test set (15% of data) never used during training or hyperparameter tuning, with demonstrated strong generalization (99.42% accuracy on unseen data).
Limitations
- Language Limitation: Model performance is optimized for English text only
- SMS Format: Trained on SMS-style messages; may require adaptation for other formats (e.g., formal business emails)
Technical Specifications
Software Requirements
- Python: 3.8+
- Framework: PyTorch, Hugging Face Transformers
Citation
If you use this model in your research or application, please cite:
@misc{roshana1s_spam_classifier_2025,
author = {Roshana Isuranga},
title = {Spam Message Classifier: RoBERTa-based Spam Detection},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/roshana1s/spam-message-classifier}},
}
Model Card Contact
Roshana1s - Hugging Face Profile
- Downloads last month
- 348
Model tree for roshana1s/spam-message-classifier
Base model
FacebookAI/roberta-base