swiss-ai
/

apertus-pretrain-toxicity

toxicity

data

Model card Files Files and versions

xet

Community

Olivia-umich commited on 13 days ago

Commit

711fc84

verified ·

1 Parent(s): f51e538

Update model card

Browse files

Files changed (1) hide show

README.md +122 -0

README.md CHANGED Viewed

@@ -1,3 +1,125 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+- fr
+- de
+- it
+- es
+- pt
+- pl
+- zh
+- nl
+base_model:
+- FacebookAI/xlm-roberta-base
+datasets:
+- PleIAs/ToxicCommons
+- yangezheng/SWSR-SexComment
+tags:
+- toxicity
+- data
 ---
+# Multilingual Toxicity Classifiers used in Apertus Pretraining
+#### Author: Olivia Simin Fan (@Olivia-umich)
+Language specific toxicity classifiers in English, French, German, Italian, Spanish, Portuguese, Polish, Chinese and Dutch, trained on [PleIAs/ToxicCommons](https://github.com/Pleias/toxic-commons) and [SWSR-SexComments](https://arxiv.org/pdf/2108.03070) datasets.
+## Model Description
+Our toxicity classifier employs a two-stage approach: we first extract the multilingual document embeddings using [*XLM-RoBERTa*](https://huggingface.co/FacebookAI/xlm-roberta-base),
+then train a language-specific 2-layer MLP for binary toxicity classification on top of these embeddings for 6 epochs.
+The classifier checkpoints with the best accuracy on the held-out validation set are further employed to annotate the toxicity scores on FineWeb-2 and FineWeb.
+The validation accuracies on the held-out test set is provided as below:
+| Language | Accuracy |
+|----------|----------|
+| English (en) | 80.13% |
+| Chinese (zh) | 79.64% |
+| French (fr) | 82.34% |
+| German (de) | 82.61% |
+| Italian (it) | 82.16% |
+| Dutch (nl) | 80.94% |
+| Polish (pl) | 81.24% |
+| Portuguese (pt) | 94.63% |
+| Spanish (sp) | 81.61% |
+## Toxicity Scoring
+An example on the usage of the toxicity classifiers:
+```python
+# Define the model with an MLP classifier on top of XLM-RoBERTa
+class RobertaClassifier(nn.Module):
+    def __init__(self, num_classes,
+                 model_name="FacebookAI/xlm-roberta-base",
+                 device="cuda:0"):
+        super(RobertaClassifier, self).__init__()
+        self.roberta = RobertaModel.from_pretrained(model_name)
+        self.freeze_roberta_encoder()
+        self.device = device
+        self.classifier = nn.Sequential(
+            nn.Linear(self.roberta.config.hidden_size, self.roberta.config.hidden_size),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(self.roberta.config.hidden_size, num_classes)
+        )
+    def freeze_roberta_encoder(self):
+        for param in self.roberta.parameters():
+            param.requires_grad = False
+    def mean_pooling(self, model_output, attention_mask):
+        import torch
+        # https://huggingface.co/aditeyabaral/sentencetransformer-xlm-roberta-base
+        token_embeddings = model_output.last_hidden_state # First element of model_output contains all token embeddings
+        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size())
+        return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+    def forward(self, input_ids=None, attention_mask=None,
+                roberta_embeddings=None):
+        # outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
+        # pooled_output = outputs.last_hidden_state[:, 0]  # CLS token representation
+        if roberta_embeddings is None:
+            outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
+            roberta_embeddings = self.mean_pooling(outputs, attention_mask)
+        logits = self.classifier(roberta_embeddings)
+        return torch.nn.functional.softmax(logits, dim=1)
+    def predict(self, input_ids=None, attention_mask=None,
+                roberta_embeddings=None, **kwargs):
+        """
+        Predicts class labels for a list of texts.
+        Args:
+            texts (list of str): The input sentences to classify.
+            max_length (int): Maximum sequence length for tokenization.
+        Returns:
+            list of int: Predicted class labels for each input text.
+        """
+        self.eval()
+        with torch.no_grad():
+            if roberta_embeddings is None:
+                logits = self(input_ids, attention_mask)
+            else:
+                logits = self(roberta_embeddings=roberta_embeddings)
+        return logits[:,1].cpu().numpy()
+```
+```python
+MODEL_PATH = f"{MODEL_DIR}/english.pth"
+DEVICE = "cpu"
+model = RobertaClassifier(device=DEVICE, num_classes=2)
+model.load_state_dict(state_dict=torch.load(MODEL_PATH, map_location=torch.device(DEVICE)))
+tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
+document = ["I want to predict the toxicity score of this document: I am happy today.",
+            "I want to predict the toxicity score of this document: this is a violent content!!"]
+inputs = tokenizer(document, return_tensors="pt", padding=True, truncation=True, max_length=512)
+model.predict(**inputs) # scores: [0.00121997, 0.9723031]
+```