MD-Judge-v0.1 / README.md
Foreshhh's picture
Update README.md
5abb0c1 verified
|
raw
history blame
1.06 kB
metadata
license: apache-2.0
datasets:
  - lmsys/toxic-chat
  - PKU-Alignment/BeaverTails
  - lmsys/lmsys-chat-1m
language:
  - en
metrics:
  - f1
  - accuracy
tags:
  - ai-safety
  - safetyguard
  - safety
  - benchmark
  - mistral
  - salad-bench
  - evluation

MD-Judge for Salad-Bench

Model Details

MD-Judge is a LLM-based safetyguard, fine-tund on top of Mistral-7B. MD-Judge serves as a classifier to evaluate the safety of QA pairs.

MD-Judge was born to study the safety of different LLMs serving as an general evaluation tool, which is proposed under the SALAD-Bench paper

  • Developed by: The SALAD-Bench Team
  • Model type: An auto-regressive language model based on the transformer architecture.

Model Sources

Uses

Please refer to our Github for more using examples


Citation

BibTeX: