toxicity
data
Olivia-umich commited on
Commit
711fc84
·
verified ·
1 Parent(s): f51e538

Update model card

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md CHANGED
@@ -1,3 +1,125 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - it
8
+ - es
9
+ - pt
10
+ - pl
11
+ - zh
12
+ - nl
13
+ base_model:
14
+ - FacebookAI/xlm-roberta-base
15
+ datasets:
16
+ - PleIAs/ToxicCommons
17
+ - yangezheng/SWSR-SexComment
18
+ tags:
19
+ - toxicity
20
+ - data
21
  ---
22
+ # Multilingual Toxicity Classifiers used in Apertus Pretraining
23
+ #### Author: Olivia Simin Fan (@Olivia-umich)
24
+
25
+ Language specific toxicity classifiers in English, French, German, Italian, Spanish, Portuguese, Polish, Chinese and Dutch, trained on [PleIAs/ToxicCommons](https://github.com/Pleias/toxic-commons) and [SWSR-SexComments](https://arxiv.org/pdf/2108.03070) datasets.
26
+
27
+
28
+ ## Model Description
29
+ Our toxicity classifier employs a two-stage approach: we first extract the multilingual document embeddings using [*XLM-RoBERTa*](https://huggingface.co/FacebookAI/xlm-roberta-base),
30
+ then train a language-specific 2-layer MLP for binary toxicity classification on top of these embeddings for 6 epochs.
31
+ The classifier checkpoints with the best accuracy on the held-out validation set are further employed to annotate the toxicity scores on FineWeb-2 and FineWeb.
32
+
33
+ The validation accuracies on the held-out test set is provided as below:
34
+ | Language | Accuracy |
35
+ |----------|----------|
36
+ | English (en) | 80.13% |
37
+ | Chinese (zh) | 79.64% |
38
+ | French (fr) | 82.34% |
39
+ | German (de) | 82.61% |
40
+ | Italian (it) | 82.16% |
41
+ | Dutch (nl) | 80.94% |
42
+ | Polish (pl) | 81.24% |
43
+ | Portuguese (pt) | 94.63% |
44
+ | Spanish (sp) | 81.61% |
45
+
46
+
47
+ ## Toxicity Scoring
48
+
49
+ An example on the usage of the toxicity classifiers:
50
+
51
+ ```python
52
+ # Define the model with an MLP classifier on top of XLM-RoBERTa
53
+ class RobertaClassifier(nn.Module):
54
+ def __init__(self, num_classes,
55
+ model_name="FacebookAI/xlm-roberta-base",
56
+ device="cuda:0"):
57
+ super(RobertaClassifier, self).__init__()
58
+ self.roberta = RobertaModel.from_pretrained(model_name)
59
+ self.freeze_roberta_encoder()
60
+ self.device = device
61
+ self.classifier = nn.Sequential(
62
+ nn.Linear(self.roberta.config.hidden_size, self.roberta.config.hidden_size),
63
+ nn.ReLU(),
64
+ nn.Dropout(0.2),
65
+ nn.Linear(self.roberta.config.hidden_size, num_classes)
66
+ )
67
+
68
+ def freeze_roberta_encoder(self):
69
+ for param in self.roberta.parameters():
70
+ param.requires_grad = False
71
+
72
+ def mean_pooling(self, model_output, attention_mask):
73
+ import torch
74
+ # https://huggingface.co/aditeyabaral/sentencetransformer-xlm-roberta-base
75
+ token_embeddings = model_output.last_hidden_state # First element of model_output contains all token embeddings
76
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size())
77
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
78
+
79
+ def forward(self, input_ids=None, attention_mask=None,
80
+ roberta_embeddings=None):
81
+ # outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
82
+ # pooled_output = outputs.last_hidden_state[:, 0] # CLS token representation
83
+ if roberta_embeddings is None:
84
+ outputs = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
85
+ roberta_embeddings = self.mean_pooling(outputs, attention_mask)
86
+ logits = self.classifier(roberta_embeddings)
87
+ return torch.nn.functional.softmax(logits, dim=1)
88
+
89
+ def predict(self, input_ids=None, attention_mask=None,
90
+ roberta_embeddings=None, **kwargs):
91
+ """
92
+ Predicts class labels for a list of texts.
93
+
94
+ Args:
95
+ texts (list of str): The input sentences to classify.
96
+ max_length (int): Maximum sequence length for tokenization.
97
+
98
+ Returns:
99
+ list of int: Predicted class labels for each input text.
100
+ """
101
+ self.eval()
102
+
103
+ with torch.no_grad():
104
+ if roberta_embeddings is None:
105
+ logits = self(input_ids, attention_mask)
106
+ else:
107
+ logits = self(roberta_embeddings=roberta_embeddings)
108
+ return logits[:,1].cpu().numpy()
109
+ ```
110
+
111
+
112
+ ```python
113
+ MODEL_PATH = f"{MODEL_DIR}/english.pth"
114
+ DEVICE = "cpu"
115
+
116
+ model = RobertaClassifier(device=DEVICE, num_classes=2)
117
+ model.load_state_dict(state_dict=torch.load(MODEL_PATH, map_location=torch.device(DEVICE)))
118
+ tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-base")
119
+
120
+ document = ["I want to predict the toxicity score of this document: I am happy today.",
121
+ "I want to predict the toxicity score of this document: this is a violent content!!"]
122
+
123
+ inputs = tokenizer(document, return_tensors="pt", padding=True, truncation=True, max_length=512)
124
+ model.predict(**inputs) # scores: [0.00121997, 0.9723031]
125
+ ```