ai4privacy
/

llama-ai4privacy-multilingual-categorical-anonymiser-openpii

@@ -31,83 +31,83 @@ model-index:
           split: test
         metrics:
           - type: f1
-            value: 0.9374
             name: F1 Score
           - type: precision
-            value: 0.8943
             name: Precision
           - type: recall
-            value: 0.9849
             name: Recall
           - type: accuracy
-            value: 0.9581
             name: Accuracy
 ---
-## Multilingual Categorical Anonymiser OpenPII (Ai4Privacy)
-This model is designed to **redact and classify Personally Identifiable Information (PII)** from multilingual text. It has been fine-tuned on the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset and supports multiple languages including French (fr), English (en), German (de), Telugu (te), Hindi (hi), Italian (it), Spanish (es), and Dutch (nl).
 ---
 ## Evaluation Metrics
-The table below summarizes the detailed evaluation results per PII label:
 | **Label**          | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
 |--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
-| SURNAME            | 2838   | 882    | 30     | 75.68%       | 76.29%        | 98.95%     | 86.16%       |
-| O (Non-PII)        | 0      | 285    | 0      | 99.50%       | n/a           | n/a        | n/a          |
-| TIME               | 1936   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
-| DRIVERLICENSENUM   | 351    | 154    | 2      | 69.23%       | 69.50%        | 99.43%     | 81.82%       |
-| PASSPORTNUM        | 321    | 243    | 2      | 56.71%       | 56.91%        | 99.38%     | 72.38%       |
-| GIVENNAME          | 6836   | 734    | 150    | 88.55%       | 90.30%        | 97.85%     | 93.93%       |
-| TELEPHONENUM       | 3634   | 6      | 1      | 99.81%       | 99.84%        | 99.97%     | 99.90%       |
-| BUILDINGNUM        | 370    | 42     | 14     | 86.85%       | 89.81%        | 96.35%     | 92.96%       |
-| AGE                | 166    | 1      | 2      | 98.22%       | 99.40%        | 98.81%     | 99.10%       |
-| DATE               | 2335   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
-| CITY               | 1529   | 163    | 110    | 84.85%       | 90.37%        | 93.29%     | 91.80%       |
-| TITLE              | 295    | 68     | 21     | 76.82%       | 81.27%        | 93.35%     | 86.89%       |
-| IDCARDNUM          | 1651   | 339    | 30     | 81.73%       | 82.96%        | 98.22%     | 89.95%       |
-| GENDER             | 104    | 17     | 0      | 85.95%       | 85.95%        | 100.0%     | 92.44%       |
-| CREDITCARDNUMBER   | 525    | 29     | 4      | 94.09%       | 94.77%        | 99.24%     | 96.95%       |
-| SEX                | 58     | 19     | 2      | 73.42%       | 75.32%        | 96.67%     | 84.67%       |
-| STREET             | 1317   | 56     | 14     | 94.95%       | 95.92%        | 98.95%     | 97.41%       |
-| TAXNUM             | 243    | 94     | 20     | 68.07%       | 72.11%        | 92.40%     | 81.00%       |
-| EMAIL              | 2608   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
-| SOCIALNUM          | 306    | 103    | 13     | 72.51%       | 74.82%        | 95.92%     | 84.07%       |
-| ZIPCODE            | 366    | 48     | 12     | 85.92%       | 88.41%        | 96.83%     | 92.42%       |
 ### Overall Evaluation
-- **Accuracy:** 95.81%
-- **Precision:** 89.43%
-- **Recall:** 98.49%
-- **F1 Score:** 93.74%
-- **Total True Positives (TP):** 27,789
-- **Total False Positives (FP):** 3,283
-- **Total False Negatives (FN):** 427
 ### Macro-Averaged Metrics
-- **Accuracy:** 85.37%
-- **Precision:** 82.09%
-- **Recall:** 93.12%
-- **F1 Score:** 86.85%
 ---
 ## Model Behavior & Limitations
 - **Evaluation Focus:**
-  The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. This model not only redacts PII but also classifies it into specific categories (e.g., SURNAME, EMAIL, etc.). Real-world performance may vary depending on the text domain and language, so additional validation is recommended. For support, contact **[email protected]**.
 - **Strengths:**
-  - High recall (98.49%) ensures most PII is detected.
-  - Perfect performance (100% F1) on labels like TIME, DATE, and EMAIL.
 - **Limitations:**
-  - Lower precision for certain labels (e.g., PASSPORTNUM at 56.91%) indicates a higher rate of false positives.
-  - The "O" (Non-PII) label has no true positives, so precision and recall are not applicable (n/a).
 ---
@@ -120,4 +120,6 @@ This model card details the evaluation metrics and fine-tuning parameters for th
 ---
-*Ai4Privacy – Committed to protecting personal data in the age of AI.*

           split: test
         metrics:
           - type: f1
+            value: 0.9150
             name: F1 Score
           - type: precision
+            value: 0.8761
             name: Precision
           - type: recall
+            value: 0.9576
             name: Recall
           - type: accuracy
+            value: 0.9503
             name: Accuracy
 ---
+# Multilingual Anonymiser OpenPII (Ai4Privacy)
+This model is designed to **redact and classify Personally Identifiable Information (PII)** from multilingual text. It has been fine-tuned on the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset and supports multiple languages, including French (fr), English (en), German (de), Telugu (te), Hindi (hi), Italian (it), Spanish (es), and Dutch (nl).
 ---
 ## Evaluation Metrics
+The table below summarizes the detailed evaluation results per PII label. Metrics are presented as percentages rounded to two decimal places. For the "O" (Non-PII) label, precision, recall, and F1 score are not applicable (n/a) due to the absence of true positives.
 | **Label**          | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
 |--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
+| O (Non-PII)        | 0      | 734    | 0      | 98.97%       | n/a           | n/a        | n/a          |
+| GIVENNAME          | 6623   | 661    | 352    | 86.73%       | 90.93%        | 94.95%     | 92.90%       |
+| SURNAME            | 2786   | 877    | 162    | 72.84%       | 76.06%        | 94.50%     | 84.28%       |
+| CITY               | 1763   | 216    | 225    | 79.99%       | 89.09%        | 88.68%     | 88.88%       |
+| DATE               | 2195   | 1      | 3      | 99.82%       | 99.95%        | 99.86%     | 99.91%       |
+| AGE                | 176    | 7      | 2      | 95.14%       | 96.17%        | 98.88%     | 97.51%       |
+| EMAIL              | 2981   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
+| CREDITCARDNUMBER   | 601    | 57     | 35     | 86.72%       | 91.34%        | 94.50%     | 92.89%       |
+| SEX                | 103    | 45     | 1      | 69.13%       | 69.59%        | 99.04%     | 81.75%       |
+| SOCIALNUM          | 364    | 134    | 20     | 70.27%       | 73.09%        | 94.79%     | 82.54%       |
+| TIME               | 1631   | 1      | 3      | 99.76%       | 99.94%        | 99.82%     | 99.88%       |
+| TELEPHONENUM       | 3537   | 10     | 9      | 99.47%       | 99.72%        | 99.75%     | 99.73%       |
+| IDCARDNUM          | 1540   | 314    | 148    | 76.92%       | 83.06%        | 91.23%     | 86.96%       |
+| ZIPCODE            | 311    | 39     | 16     | 84.97%       | 88.86%        | 95.11%     | 91.87%       |
+| DRIVERLICENSENUM   | 296    | 143    | 26     | 63.66%       | 67.43%        | 91.93%     | 77.79%       |
+| PASSPORTNUM        | 482    | 285    | 25     | 60.86%       | 62.84%        | 95.07%     | 75.67%       |
+| TITLE              | 224    | 68     | 78     | 60.54%       | 76.71%        | 74.17%     | 75.42%       |
+| BUILDINGNUM        | 292    | 45     | 14     | 83.19%       | 86.65%        | 95.42%     | 90.85%       |
+| STREET             | 1272   | 155    | 67     | 85.14%       | 89.14%        | 94.99%     | 91.97%       |
+| TAXNUM             | 471    | 101    | 34     | 77.72%       | 82.34%        | 93.27%     | 87.47%       |
+| GENDER             | 123    | 35     | 9      | 73.65%       | 77.85%        | 93.18%     | 84.83%       |
 ### Overall Evaluation
+- **Accuracy:** 95.03%
+- **Precision:** 87.61%
+- **Recall:** 95.76%
+- **F1 Score:** 91.50%
+- **Total True Positives (TP):** 27,771
+- **Total False Positives (FP):** 3,928
+- **Total False Negatives (FN):** 1,229
 ### Macro-Averaged Metrics
+- **Accuracy:** 82.17%
+- **Precision:** 80.99%
+- **Recall:** 89.96%
+- **F1 Score:** 84.91%
 ---
 ## Model Behavior & Limitations
 - **Evaluation Focus:**
+  The metrics above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. This model both redacts and classifies PII into specific categories (e.g., GIVENNAME, EMAIL). Real-world performance may vary depending on text domain and language, so additional validation is recommended. For support, contact **[email protected]**.
 - **Strengths:**
+  - High recall (95.76%) ensures most PII is detected.
+  - Exceptional performance on labels like "EMAIL" (100% F1), "DATE" (99.91% F1), and "TIME" (99.88% F1).
 - **Limitations:**
+  - Lower precision for labels such as "PASSPORTNUM" (62.84%) and "DRIVERLICENSENUM" (67.43%), indicating a higher rate of false positives.
+  - The "O" (Non-PII) label has no true positives, making precision, recall, and F1 score not applicable (n/a).
 ---
 ---
+*Ai4Privacy – Committed to protecting personal data in the age of AI.*
+---