language: - multilingual license: apache-2.0
Running Model:
To run inference you must install
pip install transformers[torch]
pip install datasets
pip install pandas
pip install tqdm
After installing those libraries you can sun the following code:
import pandas as pd
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from tqdm import tqdm
device = "cuda"
path = "Unbabel/mfineweb-edu-classifier"
model = AutoModelForSequenceClassification.from_pretrained(
path,
device_map=device,
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
def get_model_outputs(texts):
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=512).to(model.device)
with torch.no_grad():
outputs = model(**inputs)
score = outputs.logits
prob = torch.nn.functional.sigmoid(outputs.binary_logits)
return score.cpu(), prob.cpu()
def batchify_texts(texts, batch_size):
for i in range(0, len(texts), batch_size):
yield texts[i:i + batch_size]
# TODO: replace the next line with the texts you want to classify
texts = LIST_WITH_TEXTS_TO_CLASSIFY
batch_size = 64 # Adjust based on your available memory and model capacity
num_batches = (len(texts) + batch_size - 1) // batch_size
all_scores = []
all_probs = []
with tqdm(total=num_batches, dynamic_ncols=True) as pbar:
for batch_num, batch in enumerate(batchify_texts(texts, batch_size), 1):
score, probs = get_model_outputs(batch)
all_scores.append(score)
all_probs.append(probs)
pbar.set_description(f"Processing Batch {batch_num}/{num_batches}")
pbar.update(1)
# SCORES is the output of the regression head and should reflect the
# educational score of the text!
scores = torch.cat(all_scores, dim=0).squeeze()
## BINARY_PRED is the output of the classification head that tells
# if a text has an acceptable educational score or not.
# NOTE: Converting the scores into binary predictions is also possible
all_probs = torch.cat(all_probs, dim=0).squeeze()
binary_pred = (all_probs >= 0.5).numpy().astype(int)
English Results:
When testing the model on an english partition with 37537 samples the results are comparable to the original FineEdu-classifier.
Regression head results:
precision recall f1-score support
0 0.80 0.53 0.64 5130
1 0.80 0.88 0.83 21602
2 0.63 0.58 0.61 7849
3 0.54 0.62 0.58 2310
4 0.62 0.48 0.54 645
5 0.00 0.00 0.00 1
accuracy 0.74 37537
macro avg 0.56 0.51 0.53 37537
weighted avg 0.74 0.74 0.74 37537
Binary head results:
precision recall f1-score support
0 0.98 0.97 0.98 34581
1 0.71 0.74 0.73 2956
accuracy 0.96 37537
macro avg 0.85 0.86 0.85 37537
weighted avg 0.96 0.96 0.96 37537
Multilingual Results:
If we evaluate on the same texts translated into 15 different languages are almost identical!
Regression head results:
precision recall f1-score support
0 0.80 0.50 0.61 5130
1 0.79 0.87 0.83 21602
2 0.61 0.58 0.59 7849
3 0.52 0.61 0.56 2310
4 0.61 0.38 0.47 645
5 0.00 0.00 0.00 1
accuracy 0.73 37537
macro avg 0.55 0.49 0.51 37537
weighted avg 0.73 0.73 0.73 37537
Binary head results:
precision recall f1-score support
0 0.98 0.97 0.97 34581
1 0.70 0.71 0.71 2956
accuracy 0.95 37537
macro avg 0.84 0.84 0.84 37537
weighted avg 0.95 0.95 0.95 37537