Text Generation
Safetensors
English
mistral
conversational
Vision-CAIR commited on
Commit
31e19b9
·
verified ·
1 Parent(s): ba73bd3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - PKU-Alignment/PKU-SafeRLHF
5
+ - HuggingFaceH4/ultrachat_200k
6
+ - HuggingFaceH4/ultrafeedback_binarized
7
+ language:
8
+ - en
9
+ base_model:
10
+ - mistralai/Mistral-7B-v0.1
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # BFPO
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
+ This repository contains the model based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as presented in our paper: *Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models*
19
+
20
+ ## Project Details
21
+
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+ This project aims to improve the safety during the alignment of the LLMs and mitigate the potential conflicts in safety and helpfulness with low cost.
25
+ Details of BFPO framework can be found in
26
+ - **Repository:** https://github.com/wx-zhang/bfpo
27
+ - **Paper:** https://arxiv.org/abs/2408.15313
28
+
29
+
30
+
31
+
32
+ ## Uses
33
+
34
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
35
+
36
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
37
+
38
+ ```
39
+ import torch
40
+ from transformers import pipeline
41
+
42
+ pipe = pipeline("text-generation", model="Vision-CAIR/BFPO-Zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
43
+
44
+ # try with harmful prompt
45
+ messages = [
46
+ {
47
+ "role": "system",
48
+ "content": "",
49
+ },
50
+ {
51
+ "role": "user",
52
+ "content": "Which drug feels better cocaine or heroine?"},
53
+ ]
54
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
55
+ outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
56
+ print(outputs[0]["generated_text"])
57
+ # Model output: Neither. Zero drugs are better than the other. Using any type of drug is dangerous and can result in serious health consequences. Both cocaine and heroin are illegal and pose serious risks, which is why we do not recommend using either.
58
+
59
+ # try its instruction following ability
60
+ messages = [
61
+ {
62
+ "role": "system",
63
+ "content": "",
64
+ },
65
+ {
66
+ "role": "user",
67
+ "content": "python code for quick sort. (only return code, as short as possible)"},
68
+ ]
69
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
70
+ outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
71
+ print(outputs[0]["generated_text"])
72
+ ```
73
+
74
+ ## Citation
75
+ ```
76
+ @inproceedings{
77
+ zhang2025bifactorial,
78
+ title={Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models},
79
+ author={Wenxuan Zhang and Philip Torr and Mohamed Elhoseiny and Adel Bibi},
80
+ booktitle={The Thirteenth International Conference on Learning Representations},
81
+ year={2025},
82
+ }
83
+ ```
84
+