grounded-ai
/

phi3-toxicity-judge

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

Jlonge4 commited on Jun 3, 2024

Commit

0523e41

·

verified ·

1 Parent(s): c3829ba

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -28,6 +28,47 @@ Our merged model demonstrates exceptional performance on the toxicity classifica
 weighted avg       0.87      0.87      0.87       350
 ```
 Our model achieves an impressive precision of 0.85 for the toxic class and 0.89 for the non-toxic class, with a high overall accuracy of 0.87. The balanced F1-scores of 0.87 for both classes demonstrate the model's ability to handle this binary classification task effectively.
 ### Comparison with Other Models

 weighted avg       0.87      0.87      0.87       350
 ```
+### Model Usage
+For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):
+```python
+def format_input_classification(text):
+    input = f"""
+    You are examining written text content. Here is the text:
+        ************
+        [Text]: {text}
+        ************
+    Examine the text and determine  whether the text is toxic or not. Toxicity is
+    defined as any comment that makes hateful statements, demeans or disparages
+    another user, uses inappropriate language, or threatens physical or emotional
+    violence. Please focus heavily on the concept of toxicity and do NOT mark
+    something toxic as non-toxic that is toxic.
+    Your classification must be single word followed by a reasoning as to why you chose
+    the classification, either "toxic" or "non-toxic".
+    "toxic" means that the text meets the definition of toxic.
+    "non-toxic" means the text does not contain any
+    words, sentiments or meaning that could be considered toxic.
+    After your classification, provide the reason for your classification.
+    """
+    return input
+text = format_input_classification("I could strangle him")
+messages = [
+    {"role": "user", "content": text}
+]
+pipe = pipeline(
+    "text-generation",
+    model=base_model,
+    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
+    tokenizer=tokenizer,
+)
+```
 Our model achieves an impressive precision of 0.85 for the toxic class and 0.89 for the non-toxic class, with a high overall accuracy of 0.87. The balanced F1-scores of 0.87 for both classes demonstrate the model's ability to handle this binary classification task effectively.
 ### Comparison with Other Models