metadata
license: apache-2.0
datasets:
- lmsys/toxic-chat
- PKU-Alignment/BeaverTails
- lmsys/lmsys-chat-1m
language:
- en
metrics:
- f1
- accuracy
tags:
- ai-safety
- safetyguard
- safety
- benchmark
- mistral
- salad-bench
- evluation
MD-Judge for Salad-Bench
Model Details
MD-Judge is a LLM-based safetyguard, fine-tund on top of Mistral-7B. MD-Judge serves as a classifier to evaluate the safety of QA pairs.
MD-Judge was born to study the safety of different LLMs serving as an general evaluation tool, which is proposed under the SALAD-Bench paper
- Developed by: The SALAD-Bench Team
- Model type: An auto-regressive language model based on the transformer architecture.
Model Sources
- Repository: SALAD-Bench Github
- Dataset: Coming soon
- Paper: Coming soon
Uses
Please refer to our Github for more using examples
Citation
BibTeX: