somosnlp-hackathon-2022
/

es_text_neutralizer

text2text-generation

Text2Text Generation

Inclusive Language

Text Neutralization

text-generation-inference

Model card Files Files and versions

fermaat commited on Mar 29, 2022

Commit

d3f038e

·

1 Parent(s): ca3044e

Update README.md

Files changed (1) hide show

README.md +42 -2

README.md CHANGED Viewed

@@ -35,8 +35,48 @@ model-index:
 ---
 ## Model objective
-## Data used
 ## Metrics
-## Enjoy

 ---
 ## Model objective
+TBF
+## Model specs
+This model is a fine-tuned version of [spanish-t5-small](https://huggingface.co/flax-community/spanish-t5-small) on the data described below.
+It achieves the following results on the evaluation set:
+- 'eval_bleu': 93.8347,
+- 'eval_f1': 0.9904,
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-04
+- train_batch_size: 32
+- seed: 42
+- num_epochs: 10
+- weight_decay: 0,01
+## Training and evaluation data
+TBF
 ## Metrics
+For training, we used both Blue (sacrebleu implementation in HF) and BertScore. The first one, a standard in Machine Translation processes, has been added for ensuring robustness of the newly generated data, while the second one is kept for keeping the expected semantic similarity.
+However, given the actual use case, we expect generated segments to be very close to input segments and to label segments in training. As an example, we can take the following:
+inputSegment = 'De acuerdo con las informaciones anteriores , las alumnas se han quejado de la actitud de los profesores en los exámenes finales. Los representantes estudiantiles son los alumnos Juanju y Javi.'
+expectedOutput (label) = 'De acuerdo con las informaciones anteriores, el alumnado se ha quejado de la actitud del profesorado en los exámenes finales. Los representantes estudiantiles son los alumnos Juanju y Javi.'
+actualOutput = 'De acuerdo con las informaciones anteriores, el alumnado se ha quejado de la actitud del profesorado en los exámenes finales. Los representantes estudiantiles son el alumnado Juanju y Javi.'
+As you can see, segments are pretty similar. So, instead of measuring Bleu or BertScore here, we propose an alternate metric that would be DiffBleu:
+$$DiffBleu = BLEU(actualOutput - inputSegment, labels - inputSegment)$$
+Where the minuses as in set notation. This way, we also evaluate DiffBleu after the model has been trained.
+## Usage example
+Enjoy!