artificialguybr commited on
Commit
46e75fb
·
verified ·
1 Parent(s): f9bce7d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-0.5B
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: outputs/qwen2.5-0.5b-ft
9
+ results: []
10
+ ---
11
+
12
+ # outputs/qwen2.5-0.5b-ft
13
+
14
+ This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on the OpenHermes 2.5 dataset.
15
+
16
+ ## Model description
17
+
18
+ This model is based on Qwen2.5-0.5B, which is part of the latest series of Qwen large language models. Qwen2.5 brings significant improvements over its predecessor, including:
19
+
20
+ - Enhanced knowledge and capabilities in coding and mathematics
21
+ - Improved instruction following and long text generation (over 8K tokens)
22
+ - Better understanding of structured data and generation of structured outputs (especially JSON)
23
+ - Increased resilience to diverse system prompts
24
+ - Long-context support up to 128K tokens with the ability to generate up to 8K tokens
25
+ - Multilingual support for over 29 languages
26
+
27
+ The base Qwen2.5-0.5B model features:
28
+ - Type: Causal Language Model
29
+ - Training Stage: Pretraining
30
+ - Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
31
+ - Number of Parameters: 0.49B (0.36B non-embedding)
32
+ - Number of Layers: 24
33
+ - Number of Attention Heads (GQA): 14 for Q and 2 for KV
34
+ - Context Length: Full 32,768 tokens
35
+
36
+ This fine-tuned version has been trained on the OpenHermes 2.5 dataset, which is a high-quality compilation of primarily synthetically generated instruction and chat samples, reaching 1M samples in total.
37
+
38
+ ## Intended uses & limitations
39
+
40
+ This model is intended for research and application in natural language processing tasks. It can be used for various downstream tasks such as text generation, language understanding, and potentially conversational AI after appropriate fine-tuning.
41
+
42
+ Limitations:
43
+ - As a base language model, it is not recommended for direct use in conversations without further fine-tuning or post-training techniques like SFT or RLHF.
44
+ - The model's performance may vary across different languages and domains.
45
+ - Users should be aware of potential biases present in the training data.
46
+
47
+ ## Training and evaluation data
48
+
49
+ This model was fine-tuned on the OpenHermes 2.5 dataset, which is a continuation and significant expansion of the OpenHermes 1 dataset. It includes:
50
+
51
+ - A diverse range of open-source datasets
52
+ - Custom-created synthetic datasets
53
+ - 1 million primarily synthetically generated instruction and chat samples
54
+ - High-quality, curated content that has contributed to the advancements in SOTA LLMs
55
+
56
+ The dataset is notable for its role in the development of the Open Hermes 2/2.5 and Nous Hermes 2 series of models.
57
+
58
+ ## Training procedure
59
+
60
+ ### Training hyperparameters
61
+
62
+ The following hyperparameters were used during training:
63
+ - learning_rate: 1e-05
64
+ - train_batch_size: 5
65
+ - eval_batch_size: 5
66
+ - seed: 42
67
+ - gradient_accumulation_steps: 8
68
+ - total_train_batch_size: 40
69
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
70
+ - lr_scheduler_type: cosine
71
+ - lr_scheduler_warmup_steps: 100
72
+ - num_epochs: 3
73
+ - weight_decay: 0.01
74
+
75
+ Additional training details:
76
+ - Gradient Checkpointing: Enabled
77
+ - Mixed Precision: BF16 (auto)
78
+ - Sequence Length: 4096
79
+ - Sample Packing: Enabled
80
+ - Pad to Sequence Length: Enabled
81
+
82
+ ### Framework versions
83
+
84
+ - Transformers 4.45.0.dev0
85
+ - Pytorch 2.3.1+cu121
86
+ - Datasets 2.21.0
87
+ - Tokenizers 0.19.1
88
+
89
+ ## Additional Information
90
+
91
+ This model was trained using the Axolotl framework. For more details on the base model, please refer to the [Qwen2.5 blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub repository](https://github.com/QwenLM/Qwen2.5), and [documentation](https://qwen.readthedocs.io/en/latest/).
92
+
93
+ To use this model, ensure you have the latest version of the Hugging Face `transformers` library (version 4.37.0 or later) to avoid compatibility issues.
94
+
95
+ For support and further development of open-source language models, consider supporting the creator of the OpenHermes dataset on [GitHub Sponsors](https://github.com/sponsors/teknium1).
96
+
97
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
98
+ <details><summary>See axolotl config</summary>
99
+
100
+ axolotl version: `0.4.1`