Spam Message Classifier

A state-of-the-art spam message classification model built on RoBERTa-base transformer architecture, achieving 99.42% accuracy and 0.9782 F1-score for spam class for the test set. Developed as the core spam detection component for Amy, an intelligent Discord moderation bot.

Model Description

This model is a fine-tuned version of FacebookAI/roberta-base for binary spam classification in messaging applications. The classifier accurately distinguishes between legitimate messages (ham) and spam/phishing content, making it production-ready for real-world deployment in messaging platforms and content moderation systems.

  • Developed by: roshana1s
  • Model type: Binary Sequence Classification
  • Language: English
  • License: Apache-2.0
  • Base Model: FacebookAI/roberta-base
  • Primary Use Case: Discord bot moderation and real-time spam detection

Key Features

  • 🤖 Transformer-based Architecture: Built on RoBERTa-base for superior text understanding
  • ⚡ High Performance: 0.9782 F1-score for spam detection, 99.42% overall accuracy
  • 🔧 Hyperparameter Optimization: Automated tuning using Optuna framework (25 trials)
  • ⚖️ Class Imbalance Handling: Successfully addressed through weighted loss function
  • 🔗 URL Bias Mitigation: Enhanced with real-world ham messages containing links
  • 📊 Comprehensive Evaluation: Evaluated on completely unseen test set

Intended Uses

Direct Use

from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch
import re

# Device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained(
    "roshana1s/spam-message-classifier"
).to(device).eval()
tokenizer = RobertaTokenizer.from_pretrained("roshana1s/spam-message-classifier")

def preprocess_text(text: str) -> str:
    # General-purpose normalization: mask URLs, collapse whitespace
    text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()

def get_inference(text: str) -> list:
    """Returns prediction results in [{'label': str, 'score': float}, ...] format."""
    # Tokenize input text
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=False,
        max_length=128
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Run inference
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1).squeeze(0)

    # Map labels (index 0 = ham, 1 = spam)
    result = [
        {"label": "ham", "score": float(probs[0])},
        {"label": "spam", "score": float(probs[1])}
    ]

    return result

# Example usage
message = "Congratulations! You've won free Discord Nitro. Click here to claim now! <URL>"
text = preprocess_text(message)
result = get_inference(text)

# Extract scores for both classes
spam_score = next((r["score"] for r in result if r["label"] == "spam"), 0.0)
ham_score = next((r["score"] for r in result if r["label"] == "ham"), 0.0)

predicted_label = "Spam" if spam_score > ham_score else "Ham"
confidence = max(spam_score, ham_score)

print(f"Prediction: {predicted_label}")
print(f"Confidence: {confidence:.4f}")

Discord helper (optional) Include this helper when targeting Discord: it normalizes invites, mentions, and custom emoji before tokenization to improve robustness in chat contexts.

def preprocess_text(text: str) -> str:
    text = re.sub(r"(https?:\/\/)?(www\.)?(discord\.(gg|io|me|li)|discordapp\.com\/invite)\/\S+", "<DISCORD_INVITE>", text)
    text = re.sub(r"(https?://\S+|www\.\S+)", "<URL>", text)
    text = re.sub(r"<@!?\d+>", "<USER>", text)
    text = re.sub(r"<a?:\w+:\d+>", "<EMOJI>", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()

Use Cases

This spam classifier is ideal for:

Messaging Platforms

  • Discord bot moderation (Primary use case)
  • SMS filtering systems
  • Chat application content filtering

Out-of-Scope Use

  • Non-English language spam detection (trained exclusively on English data)
  • Sentiment analysis or other NLP tasks beyond binary spam classification

Training Data

The model was trained on a combination of two comprehensive SMS spam datasets:

  1. SMS Spam Collection Dataset - UCI Machine Learning Repository
  2. Discord Text Messages — a manually collected dataset of real Discord messages containing both ham and spam samples. (This dataset was created to mitigate <URL> bias.)

Preprocessing Steps:

  1. Label encoding (ham → 0, spam → 1)
  2. Text cleaning and normalization with Discord-specific preprocessing
  3. Train/validation/test split (70/15/15)
  4. Tokenization with RoBERTa tokenizer
  5. Dynamic padding and truncation

Training Procedure

Hyperparameter Optimization

Automated hyperparameter search using Optuna framework (25 trials):

Search Space:

  • Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
  • Learning rate: 1e-5 to 5e-5 range
  • Weight decay: 0.0 to 0.1 regularization
  • Batch size: 8, 16, or 32 samples
  • Gradient accumulation steps: 1 to 4
  • Training epochs: 2 to 5 epochs
  • Warmup ratio: 0.05 to 0.1 for learning rate scheduling

Best Parameters Found (Trial 6/25):

  • Hidden dropout: 0.10069482002001506
  • Attention dropout: 0.12460257350587067
  • Learning rate: 4.976184540342024e-05
  • Weight decay: 0.04490021845024478
  • Batch size: 16
  • Gradient accumulation steps: 4
  • Epochs: 4
  • Warmup ratio: 0.07622459860163384

Training Strategy

  1. Data Preprocessing: SMS text cleaning and label encoding
  2. Tokenization: Dynamic padding with maximum sequence length of 128 tokens
  3. Class Balancing: Weighted loss function to handle imbalanced dataset
  4. Hyperparameter Optimization: Optuna-based automated tuning
  5. Evaluation: Comprehensive metrics on held-out test set

Training Configuration

  • Optimizer: AdamW
  • Loss Function: Weighted Cross-Entropy (handles class imbalance)
  • Label Smoothing: 0.1 (prevents overconfidence)
  • Learning Rate Schedule: Linear warmup followed by linear decay

Evaluation Results

Performance Metrics

Metric Score
Overall Accuracy 99.41%
Weighted F1-Score 0.9941
Spam F1-Score 0.9782
Spam Precision 96.55%
Spam Recall 99.12%
Ham Precision 99.86%
Ham Recall 99.45%

Confusion Matrix

Predicted Ham Predicted Spam
Actual Ham 725 4
Actual Spam 1 112

Performance Analysis

  • True Positives: 112 spam messages correctly identified
  • True Negatives: 725 ham messages correctly identified
  • False Positives: 4
  • False Negatives: 1

Generalizability

📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.

Challenges Addressed & Solutions

✅ URL Bias Mitigation (SUCCESSFULLY ADDRESSED)

Challenge: During initial training, the model became overconfident and labeled almost all messages containing <URL> as spam, even if some were legitimate ham.

Solution: Augmented training data with additional real ham messages containing links collected from Discord servers. This helps the model understand that URLs can appear in non-spam messages and improves generalization for real-world inference, particularly important for Discord bot deployment where legitimate messages often contain links.

✅ Class Imbalance Handling (SUCCESSFULLY ADDRESSED)

Challenge: The combined dataset exhibits natural imbalance.

Solution: Implemented weighted loss function during training to handle the imbalanced dataset effectively, resulting in exceptional performance for both classes.

✅ Overfitting Prevention (SUCCESSFULLY ADDRESSED)

Challenge: Ensuring model generalizes well to unseen data.

Solution: Comprehensive evaluation on completely held-out test set (15% of data) never used during training or hyperparameter tuning, with demonstrated strong generalization (99.42% accuracy on unseen data).

Limitations

  • Language Limitation: Model performance is optimized for English text only
  • SMS Format: Trained on SMS-style messages; may require adaptation for other formats (e.g., formal business emails)

Technical Specifications

Software Requirements

  • Python: 3.8+
  • Framework: PyTorch, Hugging Face Transformers

Citation

If you use this model in your research or application, please cite:

@misc{roshana1s_spam_classifier_2025,
  author       = {Roshana Isuranga},
  title        = {Spam Message Classifier: RoBERTa-based Spam Detection},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/roshana1s/spam-message-classifier}},
}

Model Card Contact

Roshana1s - Hugging Face Profile

Downloads last month
348
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for roshana1s/spam-message-classifier

Finetuned
(1940)
this model