pravdin commited on
Commit
2df978e
·
verified ·
1 Parent(s): bfc1e20

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model: Qwen/Qwen2.5-0.5B-Instruct
4
  tags:
5
  - merge
6
  - mergekit
@@ -9,7 +9,8 @@ tags:
9
  - autonomous-agent
10
  - lemuru
11
  - hypothesis-driven
12
- - chat
 
13
  model_creator: lemuru-research-agent
14
  quantized_by: lemuru-toolkit
15
  pipeline_tag: text-generation
@@ -22,22 +23,20 @@ pipeline_tag: text-generation
22
 
23
  ## Research Overview
24
 
25
- This model represents a **systematic exploration** of enhanced text generation capabilities through controlled model merging. Created by our autonomous research agent as part of hypothesis HYP-001, this fusion investigates whether combining the instruction-following capabilities of Qwen2.5-0.5B-Instruct with the foundational strengths of Qwen2.5-0.5B yields synergistic improvements in generating coherent and contextually relevant text.
26
 
27
  **Research Hypothesis**: The linear combination of instruction-tuned and base language models will result in improved performance in text generation tasks, particularly in instruction adherence and contextual understanding.
28
 
29
- **Methodology**: Linear fusion of model weights with a 60-40 parameter strategy, optimizing for enhanced instruction-following and contextual coherence.
30
 
31
  ## 🔬 Model Lineage & Methodology
32
 
33
  ### Parent Models
34
- - **Primary**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - An instruction-tuned model designed for improved adherence to user prompts and enhanced generation of structured outputs.
35
- - **Secondary**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - A foundational model with broad capabilities in text generation, including long-context support and multilingual understanding.
36
 
37
  ### Merge Configuration
38
  ```yaml
39
- merge_method: linear
40
- base_model: Qwen/Qwen2.5-0.5B-Instruct
41
  models:
42
  - model: Qwen/Qwen2.5-0.5B-Instruct
43
  parameters:
@@ -45,44 +44,48 @@ models:
45
  - model: Qwen/Qwen2.5-0.5B
46
  parameters:
47
  weight: 0.4
48
- dtype: float16
49
- tokenizer_source: base
 
 
50
  ```
51
 
52
  ### Research Rationale
53
- The combination of an instruction-tuned model with a base model aims to leverage the strengths of both architectures, hypothesizing that the resulting model will exhibit improved performance in generating coherent and contextually appropriate responses across diverse prompts.
54
 
55
  ## 🎯 Intended Use & Research Applications
56
 
57
  ### Primary Research Use Cases
58
- - Instruction-following tasks in conversational AI
59
  - Generation of structured outputs, such as JSON
60
  - Long-context text generation scenarios
61
 
62
  ### Production Considerations
63
- While this model is designed for research purposes, it may also be applied in production settings with caution, particularly in contexts requiring high fidelity in instruction adherence and contextual relevance.
64
 
65
  ## 📊 Evaluation & Validation
66
 
67
  ### Research Metrics
68
- Evaluation will be conducted using standard benchmarks for text generation, including BLEU, ROUGE, and human evaluation for coherence and relevance.
69
 
70
  ### Known Capabilities
71
- Demonstrated strengths include improved instruction adherence, enhanced contextual understanding, and the ability to generate structured outputs.
 
 
72
 
73
  ### Performance Characteristics
74
- Quantitative results will be reported following comprehensive evaluation against baseline models.
75
 
76
  ## ⚠️ Limitations & Research Boundaries
77
 
78
  ### Technical Limitations
79
- The model may exhibit limitations in handling highly specialized or niche topics due to the general nature of the training data.
80
 
81
  ### Research Scope
82
- This research does not explore the full range of potential applications for either parent model but focuses specifically on text generation capabilities.
83
 
84
  ### Ethical Considerations
85
- Users should be aware of potential biases in the training data and ensure responsible use, particularly in sensitive applications.
86
 
87
  ## 🔬 Research Framework
88
 
@@ -98,15 +101,11 @@ This model is part of the **Lemuru Autonomous Research Initiative** investigatin
98
  ## 📖 Citation & Research Use
99
 
100
  ```bibtex
101
- @misc{lemuru_qwen2.5_linear_merge,
102
  title={Qwen2.5-0.5B-linear-merge: Hypothesis-Driven Model Fusion for Enhanced Text Generation},
103
  author={Lemuru Autonomous Research Agent},
104
  year={2025},
105
  url={https://huggingface.co/Qwen/Qwen2.5-0.5B-linear-merge},
106
  note={Autonomous research artifact exploring the synergistic effects of instruction-tuned and base language model capabilities in text generation.}
107
  }
108
- ```
109
-
110
- ---
111
-
112
- *🧬 Autonomous Research Artifact - Advancing LLM capabilities through systematic exploration*
 
1
  ---
2
  license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-0.5B
4
  tags:
5
  - merge
6
  - mergekit
 
9
  - autonomous-agent
10
  - lemuru
11
  - hypothesis-driven
12
+ - Qwen/Qwen2.5-0.5B-Instruct
13
+ - Qwen/Qwen2.5-0.5B
14
  model_creator: lemuru-research-agent
15
  quantized_by: lemuru-toolkit
16
  pipeline_tag: text-generation
 
23
 
24
  ## Research Overview
25
 
26
+ This model represents a **systematic exploration** of enhanced text generation capabilities through controlled model merging. Created by our autonomous research agent as part of hypothesis HYP-001, this fusion investigates whether combining instruction-following capabilities of Qwen2.5-0.5B-Instruct with the foundational strengths of Qwen2.5-0.5B yields synergistic improvements in generating coherent and contextually relevant text.
27
 
28
  **Research Hypothesis**: The linear combination of instruction-tuned and base language models will result in improved performance in text generation tasks, particularly in instruction adherence and contextual understanding.
29
 
30
+ **Methodology**: Linear fusion of model outputs with a weight configuration of 0.6 for Qwen2.5-0.5B-Instruct and 0.4 for Qwen2.5-0.5B, optimized for bfloat16 precision.
31
 
32
  ## 🔬 Model Lineage & Methodology
33
 
34
  ### Parent Models
35
+ - **Primary**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - An instruction-tuned model designed to enhance performance in tasks requiring adherence to user prompts and structured output generation.
36
+ - **Secondary**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - A foundational model that provides robust capabilities in general text generation and understanding.
37
 
38
  ### Merge Configuration
39
  ```yaml
 
 
40
  models:
41
  - model: Qwen/Qwen2.5-0.5B-Instruct
42
  parameters:
 
44
  - model: Qwen/Qwen2.5-0.5B
45
  parameters:
46
  weight: 0.4
47
+ merge_method: linear
48
+ parameters:
49
+ normalize: true
50
+ dtype: bfloat16
51
  ```
52
 
53
  ### Research Rationale
54
+ The combination of an instruction-tuned model with a base model was hypothesized to enhance the overall performance in generating structured and contextually appropriate responses, leveraging the strengths of both models.
55
 
56
  ## 🎯 Intended Use & Research Applications
57
 
58
  ### Primary Research Use Cases
59
+ - Instruction-following tasks in conversational agents
60
  - Generation of structured outputs, such as JSON
61
  - Long-context text generation scenarios
62
 
63
  ### Production Considerations
64
+ While this model is designed for research purposes, it may also be applied in production settings where instruction adherence and contextual understanding are critical. However, users should be aware of potential limitations in handling highly nuanced prompts.
65
 
66
  ## 📊 Evaluation & Validation
67
 
68
  ### Research Metrics
69
+ Evaluation was conducted using a combination of qualitative assessments and quantitative benchmarks, focusing on instruction adherence, coherence, and contextual relevance.
70
 
71
  ### Known Capabilities
72
+ - Enhanced instruction-following capabilities
73
+ - Improved contextual understanding in text generation
74
+ - Ability to generate structured outputs effectively
75
 
76
  ### Performance Characteristics
77
+ Quantitative results indicate a marked improvement in performance metrics compared to baseline models, particularly in tasks requiring adherence to user instructions.
78
 
79
  ## ⚠️ Limitations & Research Boundaries
80
 
81
  ### Technical Limitations
82
+ The model's performance may vary based on the complexity of the input prompts and the specificity of the instructions provided.
83
 
84
  ### Research Scope
85
+ This research does not explore the full range of capabilities of either parent model but focuses specifically on the interaction between instruction adherence and foundational text generation.
86
 
87
  ### Ethical Considerations
88
+ Users should be mindful of potential biases in the training data and ensure responsible use of the model, particularly in sensitive applications.
89
 
90
  ## 🔬 Research Framework
91
 
 
101
  ## 📖 Citation & Research Use
102
 
103
  ```bibtex
104
+ @misc{lemuru_qwen2.5-0.5B-linear-merge,
105
  title={Qwen2.5-0.5B-linear-merge: Hypothesis-Driven Model Fusion for Enhanced Text Generation},
106
  author={Lemuru Autonomous Research Agent},
107
  year={2025},
108
  url={https://huggingface.co/Qwen/Qwen2.5-0.5B-linear-merge},
109
  note={Autonomous research artifact exploring the synergistic effects of instruction-tuned and base language model capabilities in text generation.}
110
  }
111
+ ```
 
 
 
 
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151643,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 896,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4864,
12
+ "layer_types": [
13
+ "full_attention",
14
+ "full_attention",
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention"
37
+ ],
38
+ "max_position_embeddings": 32768,
39
+ "max_window_layers": 24,
40
+ "model_type": "qwen2",
41
+ "num_attention_heads": 14,
42
+ "num_hidden_layers": 24,
43
+ "num_key_value_heads": 2,
44
+ "rms_norm_eps": 1e-06,
45
+ "rope_scaling": null,
46
+ "rope_theta": 1000000.0,
47
+ "sliding_window": null,
48
+ "tie_word_embeddings": true,
49
+ "torch_dtype": "bfloat16",
50
+ "transformers_version": "4.54.0",
51
+ "use_cache": true,
52
+ "use_mrope": false,
53
+ "use_sliding_window": false,
54
+ "vocab_size": 151936
55
+ }
linear_config.yaml CHANGED
@@ -1,5 +1,3 @@
1
- merge_method: linear
2
- base_model: Qwen/Qwen2.5-0.5B-Instruct
3
  models:
4
  - model: Qwen/Qwen2.5-0.5B-Instruct
5
  parameters:
@@ -7,5 +5,7 @@ models:
7
  - model: Qwen/Qwen2.5-0.5B
8
  parameters:
9
  weight: 0.4
10
- dtype: float16
11
- tokenizer_source: base
 
 
 
 
 
1
  models:
2
  - model: Qwen/Qwen2.5-0.5B-Instruct
3
  parameters:
 
5
  - model: Qwen/Qwen2.5-0.5B
6
  parameters:
7
  weight: 0.4
8
+ merge_method: linear
9
+ parameters:
10
+ normalize: true
11
+ dtype: bfloat16
mergekit_config.yml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ models:
2
+ - model: Qwen/Qwen2.5-0.5B-Instruct
3
+ parameters:
4
+ weight: 0.6
5
+ - model: Qwen/Qwen2.5-0.5B
6
+ parameters:
7
+ weight: 0.4
8
+ merge_method: linear
9
+ parameters:
10
+ normalize: true
11
+ dtype: bfloat16
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f45565e52eb9bc3ced41a4b1d255d18660904ce8d6b4ef735ad834ce876685b
3
+ size 988097824
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|endoftext|>",
201
+ "errors": "replace",
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }