Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +24 -25
config.json +55 -0
linear_config.yaml +4 -4
mergekit_config.yml +11 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +207 -0

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 license: apache-2.0
-base_model: Qwen/Qwen2.5-0.5B-Instruct
 tags:
 - merge
 - mergekit
@@ -9,7 +9,8 @@ tags:
 - autonomous-agent
 - lemuru
 - hypothesis-driven
-- chat
 model_creator: lemuru-research-agent
 quantized_by: lemuru-toolkit
 pipeline_tag: text-generation
@@ -22,22 +23,20 @@ pipeline_tag: text-generation
 ## Research Overview
-This model represents a **systematic exploration** of enhanced text generation capabilities through controlled model merging. Created by our autonomous research agent as part of hypothesis HYP-001, this fusion investigates whether combining the instruction-following capabilities of Qwen2.5-0.5B-Instruct with the foundational strengths of Qwen2.5-0.5B yields synergistic improvements in generating coherent and contextually relevant text.
 **Research Hypothesis**: The linear combination of instruction-tuned and base language models will result in improved performance in text generation tasks, particularly in instruction adherence and contextual understanding.
-**Methodology**: Linear fusion of model weights with a 60-40 parameter strategy, optimizing for enhanced instruction-following and contextual coherence.
 ## 🔬 Model Lineage & Methodology
 ### Parent Models
-- **Primary**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - An instruction-tuned model designed for improved adherence to user prompts and enhanced generation of structured outputs.
-- **Secondary**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - A foundational model with broad capabilities in text generation, including long-context support and multilingual understanding.
 ### Merge Configuration
 ```yaml
-merge_method: linear
-base_model: Qwen/Qwen2.5-0.5B-Instruct
 models:
   - model: Qwen/Qwen2.5-0.5B-Instruct
     parameters:
@@ -45,44 +44,48 @@ models:
   - model: Qwen/Qwen2.5-0.5B
     parameters:
       weight: 0.4
-dtype: float16
-tokenizer_source: base
 ```
 ### Research Rationale
-The combination of an instruction-tuned model with a base model aims to leverage the strengths of both architectures, hypothesizing that the resulting model will exhibit improved performance in generating coherent and contextually appropriate responses across diverse prompts.
 ## 🎯 Intended Use & Research Applications
 ### Primary Research Use Cases
-- Instruction-following tasks in conversational AI
 - Generation of structured outputs, such as JSON
 - Long-context text generation scenarios
 ### Production Considerations
-While this model is designed for research purposes, it may also be applied in production settings with caution, particularly in contexts requiring high fidelity in instruction adherence and contextual relevance.
 ## 📊 Evaluation & Validation
 ### Research Metrics
-Evaluation will be conducted using standard benchmarks for text generation, including BLEU, ROUGE, and human evaluation for coherence and relevance.
 ### Known Capabilities
-Demonstrated strengths include improved instruction adherence, enhanced contextual understanding, and the ability to generate structured outputs.
 ### Performance Characteristics
-Quantitative results will be reported following comprehensive evaluation against baseline models.
 ## ⚠️ Limitations & Research Boundaries
 ### Technical Limitations
-The model may exhibit limitations in handling highly specialized or niche topics due to the general nature of the training data.
 ### Research Scope
-This research does not explore the full range of potential applications for either parent model but focuses specifically on text generation capabilities.
 ### Ethical Considerations
-Users should be aware of potential biases in the training data and ensure responsible use, particularly in sensitive applications.
 ## 🔬 Research Framework
@@ -98,15 +101,11 @@ This model is part of the **Lemuru Autonomous Research Initiative** investigatin
 ## 📖 Citation & Research Use
 ```bibtex
-@misc{lemuru_qwen2.5_linear_merge,
   title={Qwen2.5-0.5B-linear-merge: Hypothesis-Driven Model Fusion for Enhanced Text Generation},
   author={Lemuru Autonomous Research Agent},
   year={2025},
   url={https://huggingface.co/Qwen/Qwen2.5-0.5B-linear-merge},
   note={Autonomous research artifact exploring the synergistic effects of instruction-tuned and base language model capabilities in text generation.}
 }
-```
----
-*🧬 Autonomous Research Artifact - Advancing LLM capabilities through systematic exploration*

 ---
 license: apache-2.0
+base_model: Qwen/Qwen2.5-0.5B
 tags:
 - merge
 - mergekit
 - autonomous-agent
 - lemuru
 - hypothesis-driven
+- Qwen/Qwen2.5-0.5B-Instruct
+- Qwen/Qwen2.5-0.5B
 model_creator: lemuru-research-agent
 quantized_by: lemuru-toolkit
 pipeline_tag: text-generation
 ## Research Overview
+This model represents a **systematic exploration** of enhanced text generation capabilities through controlled model merging. Created by our autonomous research agent as part of hypothesis HYP-001, this fusion investigates whether combining instruction-following capabilities of Qwen2.5-0.5B-Instruct with the foundational strengths of Qwen2.5-0.5B yields synergistic improvements in generating coherent and contextually relevant text.
 **Research Hypothesis**: The linear combination of instruction-tuned and base language models will result in improved performance in text generation tasks, particularly in instruction adherence and contextual understanding.
+**Methodology**: Linear fusion of model outputs with a weight configuration of 0.6 for Qwen2.5-0.5B-Instruct and 0.4 for Qwen2.5-0.5B, optimized for bfloat16 precision.
 ## 🔬 Model Lineage & Methodology
 ### Parent Models
+- **Primary**: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - An instruction-tuned model designed to enhance performance in tasks requiring adherence to user prompts and structured output generation.
+- **Secondary**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - A foundational model that provides robust capabilities in general text generation and understanding.
 ### Merge Configuration
 ```yaml
 models:
   - model: Qwen/Qwen2.5-0.5B-Instruct
     parameters:
   - model: Qwen/Qwen2.5-0.5B
     parameters:
       weight: 0.4
+merge_method: linear
+parameters:
+  normalize: true
+dtype: bfloat16
 ```
 ### Research Rationale
+The combination of an instruction-tuned model with a base model was hypothesized to enhance the overall performance in generating structured and contextually appropriate responses, leveraging the strengths of both models.
 ## 🎯 Intended Use & Research Applications
 ### Primary Research Use Cases
+- Instruction-following tasks in conversational agents
 - Generation of structured outputs, such as JSON
 - Long-context text generation scenarios
 ### Production Considerations
+While this model is designed for research purposes, it may also be applied in production settings where instruction adherence and contextual understanding are critical. However, users should be aware of potential limitations in handling highly nuanced prompts.
 ## 📊 Evaluation & Validation
 ### Research Metrics
+Evaluation was conducted using a combination of qualitative assessments and quantitative benchmarks, focusing on instruction adherence, coherence, and contextual relevance.
 ### Known Capabilities
+- Enhanced instruction-following capabilities
+- Improved contextual understanding in text generation
+- Ability to generate structured outputs effectively
 ### Performance Characteristics
+Quantitative results indicate a marked improvement in performance metrics compared to baseline models, particularly in tasks requiring adherence to user instructions.
 ## ⚠️ Limitations & Research Boundaries
 ### Technical Limitations
+The model's performance may vary based on the complexity of the input prompts and the specificity of the instructions provided.
 ### Research Scope
+This research does not explore the full range of capabilities of either parent model but focuses specifically on the interaction between instruction adherence and foundational text generation.
 ### Ethical Considerations
+Users should be mindful of potential biases in the training data and ensure responsible use of the model, particularly in sensitive applications.
 ## 🔬 Research Framework
 ## 📖 Citation & Research Use
 ```bibtex
+@misc{lemuru_qwen2.5-0.5B-linear-merge,
   title={Qwen2.5-0.5B-linear-merge: Hypothesis-Driven Model Fusion for Enhanced Text Generation},
   author={Lemuru Autonomous Research Agent},
   year={2025},
   url={https://huggingface.co/Qwen/Qwen2.5-0.5B-linear-merge},
   note={Autonomous research artifact exploring the synergistic effects of instruction-tuned and base language model capabilities in text generation.}
 }
+```

config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.54.0",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

linear_config.yaml CHANGED Viewed

@@ -1,5 +1,3 @@
-merge_method: linear
-base_model: Qwen/Qwen2.5-0.5B-Instruct
 models:
   - model: Qwen/Qwen2.5-0.5B-Instruct
     parameters:
@@ -7,5 +5,7 @@ models:
   - model: Qwen/Qwen2.5-0.5B
     parameters:
       weight: 0.4
-dtype: float16
-tokenizer_source: base

 models:
   - model: Qwen/Qwen2.5-0.5B-Instruct
     parameters:
   - model: Qwen/Qwen2.5-0.5B
     parameters:
       weight: 0.4
+merge_method: linear
+parameters:
+  normalize: true
+dtype: bfloat16

mergekit_config.yml ADDED Viewed

	@@ -0,0 +1,11 @@

+models:
+  - model: Qwen/Qwen2.5-0.5B-Instruct
+    parameters:
+      weight: 0.6
+  - model: Qwen/Qwen2.5-0.5B
+    parameters:
+      weight: 0.4
+merge_method: linear
+parameters:
+  normalize: true
+dtype: bfloat16

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f45565e52eb9bc3ced41a4b1d255d18660904ce8d6b4ef735ad834ce876685b
+size 988097824

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}