|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- hblim/customer-complaints |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- google-bert/bert-base-uncased |
|
tags: |
|
- bert |
|
- transformers |
|
- customer-complaints |
|
- text-classification |
|
- multiclass |
|
- huggingface |
|
- fine-tuned |
|
- wandb |
|
--- |
|
|
|
# BERT Base (Uncased) Fine-Tuned on Customer Complaint Classification (3 Classes) |
|
|
|
## 🧾 Model Description |
|
|
|
This model is a fine-tuned version of [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) using Hugging Face Transformers on a custom dataset of customer complaints. The task is **multi-class text classification**, where each complaint is categorized into one of **three classes**. |
|
|
|
The model is intended to support downstream tasks like complaint triage, issue type prediction, or support ticket classification. |
|
|
|
Training and evaluation were tracked using [Weights & Biases](https://wandb.ai/), and all hyperparameters are reproducible and logged below. |
|
|
|
--- |
|
|
|
## 🧠 Intended Use |
|
|
|
- 🏷 Classify customer complaint text into 3 predefined categories |
|
- 📊 Analyze complaint trends over time |
|
- 💬 Serve as a backend model for customer service applications |
|
|
|
--- |
|
|
|
## 📚 Dataset |
|
|
|
- Dataset Name: [hblim/customer-complaints](https://huggingface.co/datasets/hblim/customer-complaints) |
|
- Dataset Type: Multiclass text classification |
|
- Classes: billing, product, delivery |
|
- Preprocessing: Standard BERT tokenization |
|
|
|
--- |
|
|
|
## ⚙️ Training Details |
|
|
|
- Base Model: `bert-base-uncased` |
|
- Epochs: **10** |
|
- Batch Size: **1** |
|
- Learning Rate: **1e-5** |
|
- Weight Decay: **0.05** |
|
- Warmup Ratio: **0.20** |
|
- LR Scheduler: `linear` |
|
- Optimizer: `AdamW` |
|
- Evaluation Strategy: every **100 steps** |
|
- Logging: every **100 steps** |
|
- Trainer: Hugging Face `Trainer` |
|
- Hardware: Single NVIDIA GeForce RTX 3080 GPU |
|
|
|
--- |
|
|
|
## 📈 Metrics |
|
|
|
Evaluation was tracked using: |
|
- **Accuracy** |
|
|
|
To reproduce metrics and training logs, refer to the corresponding W&B run: |
|
[Weights & Biases Run - `baseline-hf-hub`](https://wandb.ai/notslahify/customer%20complaints%20fine%20tuning/runs/c75ddclr) |
|
|
|
|
|
| Step | Training Loss | Validation Loss | Accuracy | |
|
|------|---------------|-----------------|------------| |
|
| 100 | 1.106100 | 1.040519 | 0.523810 | |
|
| 200 | 0.944800 | 0.744273 | 0.738095 | |
|
| 300 | 0.660000 | 0.385309 | 0.900000 | |
|
| 400 | 0.412400 | 0.273423 | 0.904762 | |
|
| 500 | 0.220800 | 0.185636 | 0.923810 | |
|
| 600 | 0.163400 | 0.245850 | 0.919048 | |
|
| 700 | 0.116100 | 0.180523 | 0.942857 | |
|
| 800 | 0.097200 | 0.254475 | 0.928571 | |
|
| 900 | 0.052200 | 0.233583 | 0.942857 | |
|
| 1000 | 0.050700 | 0.223150 | 0.928571 | |
|
| 1100 | 0.035100 | 0.271416 | 0.919048 | |
|
| 1200 | 0.027700 | 0.226478 | 0.933333 | |
|
| 1300 | 0.009000 | 0.218807 | 0.938095 | |
|
| 1400 | 0.013600 | 0.246330 | 0.928571 | |
|
| 1500 | 0.014500 | 0.226987 | 0.933333 | |
|
|
|
--- |
|
|
|
## 🚀 How to Use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("your-username/baseline-hf-hub") |
|
tokenizer = AutoTokenizer.from_pretrained("your-username/baseline-hf-hub") |
|
|
|
inputs = tokenizer("I want to report an issue with my account", return_tensors="pt") |
|
outputs = model(**inputs) |
|
predicted_class = outputs.logits.argmax(dim=-1).item() |