huihui-ai commited on
Commit
60009c1
·
verified ·
1 Parent(s): 481371b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +250 -3
README.md CHANGED
@@ -1,3 +1,250 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - unsloth/gpt-oss-120b-BF16
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - vllm
9
+ - unsloth
10
+ - abliterated
11
+ - uncensored
12
+ extra_gated_prompt: >-
13
+ **Usage Warnings**
14
+
15
+
16
+ “**Risk of Sensitive or Controversial Outputs**“: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
17
+
18
+ “**Not Suitable for All Audiences**:“ Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
19
+
20
+ “**Legal and Ethical Responsibilities**“: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
21
+
22
+ “**Research and Experimental Use**“: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
23
+
24
+ “**Monitoring and Review Recommendations**“: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
25
+
26
+ “**No Default Safety Guarantees**“: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
27
+
28
+ extra_gated_fields:
29
+ X Account(@username): text
30
+ extra_gated_description: >-
31
+ Enter your X account **username** (e.g., @**username** in the form, https://x.com/username.)
32
+ After submitting, follow https://x.com/support_huihui
33
+ on X to expedite your approval. We'll review your request within 24-48 hours.
34
+ extra_gated_button_content: Submit
35
+
36
+ ---
37
+
38
+ # huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated
39
+
40
+
41
+ This is an uncensored version of [unsloth/gpt-oss-120b-BF16](https://huggingface.co/unsloth/gpt-oss-120b-BF16) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
42
+
43
+
44
+
45
+ ## Usage
46
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
47
+
48
+
49
+ ```python
50
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
51
+ import torch
52
+ import os
53
+ import signal
54
+ import random
55
+ import numpy as np
56
+ import time
57
+ from collections import Counter
58
+
59
+ cpu_count = os.cpu_count()
60
+ print(f"Number of CPU cores in the system: {cpu_count}")
61
+ half_cpu_count = cpu_count // 2
62
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
63
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
64
+ torch.set_num_threads(half_cpu_count)
65
+
66
+ print(f"PyTorch threads: {torch.get_num_threads()}")
67
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
68
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
69
+
70
+ # Load the model and tokenizer
71
+ NEW_MODEL_ID = "huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated"
72
+ print(f"Load Model {NEW_MODEL_ID} ... ")
73
+
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ NEW_MODEL_ID,
76
+ device_map="auto",
77
+ trust_remote_code=True,
78
+ torch_dtype=torch.bfloat16,
79
+ low_cpu_mem_usage=True,
80
+ )
81
+ #print(model)
82
+ #print(model.config)
83
+
84
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
85
+
86
+ messages = []
87
+ skip_prompt=False
88
+ skip_special_tokens=False
89
+ do_sample = True
90
+
91
+ class CustomTextStreamer(TextStreamer):
92
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
93
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
94
+ self.generated_text = ""
95
+ self.stop_flag = False
96
+ self.init_time = time.time() # Record initialization time
97
+ self.end_time = None # To store end time
98
+ self.first_token_time = None # To store first token generation time
99
+ self.token_count = 0 # To track total tokens
100
+
101
+ def on_finalized_text(self, text: str, stream_end: bool = False):
102
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
103
+ self.first_token_time = time.time()
104
+ self.generated_text += text
105
+ # Count tokens in the generated text
106
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
107
+ self.token_count += len(tokens)
108
+ print(text, end="", flush=True)
109
+ if stream_end:
110
+ self.end_time = time.time() # Record end time when streaming ends
111
+ if self.stop_flag:
112
+ raise StopIteration
113
+
114
+ def stop_generation(self):
115
+ self.stop_flag = True
116
+ self.end_time = time.time() # Record end time when generation is stopped
117
+
118
+ def get_metrics(self):
119
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
120
+ if self.end_time is None:
121
+ self.end_time = time.time() # Set end time if not already set
122
+ total_time = self.end_time - self.init_time # Total time from init to end
123
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
124
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
125
+ metrics = {
126
+ "init_time": self.init_time,
127
+ "first_token_time": self.first_token_time,
128
+ "first_token_latency": first_token_latency,
129
+ "end_time": self.end_time,
130
+ "total_time": total_time, # Total time in seconds
131
+ "total_tokens": self.token_count,
132
+ "tokens_per_second": tokens_per_second
133
+ }
134
+ return metrics
135
+
136
+ def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
137
+ input_ids = tokenizer.apply_chat_template(
138
+ messages,
139
+ add_generation_prompt=True,
140
+ return_tensors="pt",
141
+ return_dict=True,
142
+ ).to(model.device)
143
+
144
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
145
+
146
+ def signal_handler(sig, frame):
147
+ streamer.stop_generation()
148
+ print("\n[Generation stopped by user with Ctrl+C]")
149
+
150
+ signal.signal(signal.SIGINT, signal_handler)
151
+
152
+ generate_kwargs = {}
153
+ if do_sample:
154
+ generate_kwargs = {
155
+ "do_sample": do_sample,
156
+ "max_length": max_new_tokens,
157
+ "temperature": 0.7,
158
+ "top_k": 20,
159
+ "top_p": 0.8,
160
+ "repetition_penalty": 1.2,
161
+ "no_repeat_ngram_size": 2
162
+ }
163
+ else:
164
+ generate_kwargs = {
165
+ "do_sample": do_sample,
166
+ "max_length": max_new_tokens,
167
+ "repetition_penalty": 1.2,
168
+ "no_repeat_ngram_size": 2
169
+ }
170
+
171
+
172
+ print("Response: ", end="", flush=True)
173
+ try:
174
+ generated_ids = model.generate(
175
+ **input_ids,
176
+ streamer=streamer,
177
+ **generate_kwargs
178
+ )
179
+ del generated_ids
180
+ except StopIteration:
181
+ print("\n[Stopped by user]")
182
+
183
+ del input_ids
184
+ torch.cuda.empty_cache()
185
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
186
+
187
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
188
+
189
+ while True:
190
+ print(f"skip_prompt: {skip_prompt}")
191
+ print(f"skip_special_tokens: {skip_special_tokens}")
192
+ print(f"do_sample: {do_sample}")
193
+
194
+ user_input = input("User: ").strip()
195
+ if user_input.lower() == "/exit":
196
+ print("Exiting chat.")
197
+ break
198
+ if user_input.lower() == "/clear":
199
+ messages = []
200
+ print("Chat history cleared. Starting a new conversation.")
201
+ continue
202
+ if user_input.lower() == "/skip_prompt":
203
+ skip_prompt = not skip_prompt
204
+ continue
205
+ if user_input.lower() == "/skip_special_tokens":
206
+ skip_special_tokens = not skip_special_tokens
207
+ continue
208
+ if user_input.lower() == "/do_sample":
209
+ do_sample = not do_sample
210
+ continue
211
+ if not user_input:
212
+ print("Input cannot be empty. Please enter something.")
213
+ continue
214
+
215
+
216
+ messages.append({"role": "user", "content": user_input})
217
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, do_sample, 40960)
218
+ print("\n\nMetrics:")
219
+ for key, value in metrics.items():
220
+ print(f" {key}: {value}")
221
+
222
+ print("", flush=True)
223
+ if stop_flag:
224
+ continue
225
+ messages.append({"role": "assistant", "content": response})
226
+ ```
227
+
228
+ ## Usage Warnings
229
+
230
+
231
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
232
+
233
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
234
+
235
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
236
+
237
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
238
+
239
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
240
+
241
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
242
+
243
+
244
+ ### Donation
245
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
246
+ - bitcoin:
247
+ ```
248
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
249
+ ```
250
+ - Support our work on Ko-fi (https://ko-fi.com/huihuiai)!