OpenSafetyLab
/

MD-Judge-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions

MD-Judge-v0.1 / README.md

Foreshhh's picture

Update README.md

5abb0c1 verified over 1 year ago

|

1.06 kB

	---
	license: apache-2.0
	datasets:
	- lmsys/toxic-chat
	- PKU-Alignment/BeaverTails
	- lmsys/lmsys-chat-1m
	language:
	- en
	metrics:
	- f1
	- accuracy
	tags:
	- ai-safety
	- safetyguard
	- safety
	- benchmark
	- mistral
	- salad-bench
	- evluation
	---
	# MD-Judge for Salad-Bench


	## Model Details

	MD-Judge is a LLM-based safetyguard, fine-tund on top of [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). MD-Judge serves as a classifier to evaluate the safety of QA pairs.

	MD-Judge was born to study the safety of different LLMs serving as an general evaluation tool, which is proposed under the [SALAD-Bench paper]()

	- Developed by: The SALAD-Bench Team
	- Model type: An auto-regressive language model based on the transformer architecture.

	## Model Sources

	- Repository: [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
	- Dataset: Coming soon
	- Paper: Coming soon

	## Uses
	Please refer to our [Github](https://github.com/OpenSafetyLab/SALAD-BENCH) for more using examples

	```python

	```

	## Citation

	BibTeX: