Spam Message Classifier

A state-of-the-art spam message classification model built on RoBERTa-base transformer architecture, achieving 99.42% accuracy and 0.9782 F1-score for spam class for the test set. Developed as the core spam detection component for Amy, an intelligent Discord moderation bot.

Model Description

This model is a fine-tuned version of FacebookAI/roberta-base for binary spam classification in messaging applications. The classifier accurately distinguishes between legitimate messages (ham) and spam/phishing content, making it production-ready for real-world deployment in messaging platforms and content moderation systems.

Developed by: roshana1s
Model type: Binary Sequence Classification
Language: English
License: Apache-2.0
Base Model: FacebookAI/roberta-base
Primary Use Case: Discord bot moderation and real-time spam detection

Key Features

🤖 Transformer-based Architecture: Built on RoBERTa-base for superior text understanding
⚡ High Performance: 0.9782 F1-score for spam detection, 99.42% overall accuracy
🔧 Hyperparameter Optimization: Automated tuning using Optuna framework (25 trials)
⚖️ Class Imbalance Handling: Successfully addressed through weighted loss function
🔗 URL Bias Mitigation: Enhanced with real-world ham messages containing links
📊 Comprehensive Evaluation: Evaluated on completely unseen test set

Intended Uses

Direct Use

from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch
import re

# Device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained(
    "roshana1s/spam-message-classifier"
).to(device).eval()
tokenizer = RobertaTokenizer.from_pretrained("roshana1s/spam-message-classifier")

def preprocess_text(text: str) -> str:
    # General-purpose normalization: mask URLs, collapse whitespace
    text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()

def get_inference(text: str) -> list:
    """Returns prediction results in [{'label': str, 'score': float}, ...] format."""
    # Tokenize input text
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=False,
        max_length=128
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Run inference
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1).squeeze(0)

    # Map labels (index 0 = ham, 1 = spam)
    result = [
        {"label": "ham", "score": float(probs[0])},
        {"label": "spam", "score": float(probs[1])}
    ]

    return result

# Example usage
message = "Congratulations! You've won free Discord Nitro. Click here to claim now! <URL>"
text = preprocess_text(message)
result = get_inference(text)

# Extract scores for both classes
spam_score = next((r["score"] for r in result if r["label"] == "spam"), 0.0)
ham_score = next((r["score"] for r in result if r["label"] == "ham"), 0.0)

predicted_label = "Spam" if spam_score > ham_score else "Ham"
confidence = max(spam_score, ham_score)

print(f"Prediction: {predicted_label}")
print(f"Confidence: {confidence:.4f}")

Discord helper (optional) Include this helper when targeting Discord: it normalizes invites, mentions, and custom emoji before tokenization to improve robustness in chat contexts.

def preprocess_text(text: str) -> str:
    text = re.sub(r"(https?:\/\/)?(www\.)?(discord\.(gg|io|me|li)|discordapp\.com\/invite)\/\S+", "<DISCORD_INVITE>", text)
    text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
    text = re.sub(r"<@!?\d+>", "<USER>", text)
    text = re.sub(r"<a?:\w+:\d+>", "<EMOJI>", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()

Use Cases

This spam classifier is ideal for:

Messaging Platforms

Discord bot moderation (Primary use case)
SMS filtering systems
Chat application content filtering

Out-of-Scope Use

Non-English language spam detection (trained exclusively on English data)
Sentiment analysis or other NLP tasks beyond binary spam classification

Training Data

The model was trained on a combination of two comprehensive SMS spam datasets:

SMS Spam Collection Dataset - UCI Machine Learning Repository
Discord Text Messages — a manually collected dataset of real Discord messages containing both ham and spam samples. (This dataset was created to mitigate <URL> bias.)

Preprocessing Steps:

Label encoding (ham → 0, spam → 1)
Text cleaning and normalization with Discord-specific preprocessing
Train/validation/test split (70/15/15)
Tokenization with RoBERTa tokenizer
Dynamic padding and truncation

Training Procedure

Hyperparameter Optimization

Automated hyperparameter search using Optuna framework (25 trials):

Search Space:

Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
Learning rate: 1e-5 to 5e-5 range
Weight decay: 0.0 to 0.1 regularization
Batch size: 8, 16, or 32 samples
Gradient accumulation steps: 1 to 4
Training epochs: 2 to 5 epochs
Warmup ratio: 0.05 to 0.1 for learning rate scheduling

Best Parameters Found (Trial 6/25):

Hidden dropout: 0.10069482002001506
Attention dropout: 0.12460257350587067
Learning rate: 4.976184540342024e-05
Weight decay: 0.04490021845024478
Batch size: 16
Gradient accumulation steps: 4
Epochs: 4
Warmup ratio: 0.07622459860163384

Training Strategy

Data Preprocessing: SMS text cleaning and label encoding
Tokenization: Dynamic padding with maximum sequence length of 128 tokens
Class Balancing: Weighted loss function to handle imbalanced dataset
Hyperparameter Optimization: Optuna-based automated tuning
Evaluation: Comprehensive metrics on held-out test set

Training Configuration

Optimizer: AdamW
Loss Function: Weighted Cross-Entropy (handles class imbalance)
Label Smoothing: 0.1 (prevents overconfidence)
Learning Rate Schedule: Linear warmup followed by linear decay

Evaluation Results

Performance Metrics

Metric	Score
Overall Accuracy	99.41%
Weighted F1-Score	0.9941
Spam F1-Score	0.9782
Spam Precision	96.55%
Spam Recall	99.12%
Ham Precision	99.86%
Ham Recall	99.45%

Confusion Matrix

	Predicted Ham	Predicted Spam
Actual Ham	725	4
Actual Spam	1	112

Performance Analysis

True Positives: 112 spam messages correctly identified
True Negatives: 725 ham messages correctly identified
False Positives: 4
False Negatives: 1

Generalizability

📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.

Challenges Addressed & Solutions

✅ URL Bias Mitigation (SUCCESSFULLY ADDRESSED)

Challenge: During initial training, the model became overconfident and labeled almost all messages containing <URL> as spam, even if some were legitimate ham.

Solution: Augmented training data with additional real ham messages containing links collected from Discord servers. This helps the model understand that URLs can appear in non-spam messages and improves generalization for real-world inference, particularly important for Discord bot deployment where legitimate messages often contain links.

✅ Class Imbalance Handling (SUCCESSFULLY ADDRESSED)

Challenge: The combined dataset exhibits natural imbalance.

Solution: Implemented weighted loss function during training to handle the imbalanced dataset effectively, resulting in exceptional performance for both classes.

✅ Overfitting Prevention (SUCCESSFULLY ADDRESSED)

Challenge: Ensuring model generalizes well to unseen data.

Solution: Comprehensive evaluation on completely held-out test set (15% of data) never used during training or hyperparameter tuning, with demonstrated strong generalization (99.42% accuracy on unseen data).

Limitations

Language Limitation: Model performance is optimized for English text only
SMS Format: Trained on SMS-style messages; may require adaptation for other formats (e.g., formal business emails)

Technical Specifications

Software Requirements

Python: 3.8+
Framework: PyTorch, Hugging Face Transformers

Citation

If you use this model in your research or application, please cite:

@misc{roshana1s_spam_classifier_2025,
  author       = {Roshana Isuranga},
  title        = {Spam Message Classifier: RoBERTa-based Spam Detection},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/roshana1s/spam-message-classifier}},
}

Model Card Contact

Roshana1s - Hugging Face Profile

Downloads last month: 348

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for roshana1s/spam-message-classifier

Base model

FacebookAI/roberta-base

Finetuned

(1940)

this model