DavidAU commited on
Commit
6f5fa78
·
verified ·
1 Parent(s): 203c78a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-Coder-32B-Instruct
7
+ - open-r1/OlympicCoder-32B
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - merge
11
+ - programming
12
+ - code generation
13
+ - code
14
+ - codeqwen
15
+ - moe
16
+ - coding
17
+ - coder
18
+ - qwen2
19
+ - chat
20
+ - qwen
21
+ - qwen-coder
22
+ - mixture of experts
23
+ - qwen2moe
24
+ - 2X32B Shared.
25
+ - shared expert
26
+ library_name: transformers
27
+ ---
28
+
29
+ [uploading...]
30
+
31
+ <h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.2</h2>
32
+
33
+ This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
34
+
35
+ The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.
36
+
37
+ The two best Coders in one that are stronger than the sum of their parts.
38
+
39
+ Both models code together.
40
+
41
+ Info about each model below, followed by settings/info on using this MOE model.
42
+
43
+ ---
44
+
45
+ # Qwen2.5-Coder-32B-Instruct
46
+
47
+ ## Introduction
48
+
49
+ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
50
+
51
+ - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
52
+ - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
53
+ - **Long-context Support** up to 128K tokens.
54
+
55
+ **This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
56
+ - Type: Causal Language Models
57
+ - Training Stage: Pretraining & Post-training
58
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
59
+ - Number of Parameters: 32.5B
60
+ - Number of Paramaters (Non-Embedding): 31.0B
61
+ - Number of Layers: 64
62
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
63
+ - Context Length: Full 131,072 tokens
64
+ - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
65
+
66
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
67
+
68
+ and see also:
69
+
70
+ https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
71
+
72
+ ---
73
+
74
+ # Model Card for OlympicCoder-32B
75
+
76
+ OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.
77
+
78
+ * Repository: https://github.com/huggingface/open-r1
79
+ * Blog post: https://huggingface.co/blog/open-r1/update-3
80
+
81
+ ## Model description
82
+
83
+ - **Model type:** A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
84
+ - **Language(s) (NLP):** Primarily English
85
+ - **License:** apache-2.0
86
+ - **Finetuned from model:** [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)
87
+
88
+ ## Evaluation
89
+
90
+ We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:
91
+
92
+ * **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
93
+ * **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).
94
+
95
+ > [!NOTE]
96
+ > The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.
97
+
98
+
99
+ For more info on this model, including benchmarks see:
100
+
101
+ https://huggingface.co/open-r1/OlympicCoder-32B
102
+
103
+ ---
104
+
105
+ <h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1</h2>
106
+
107
+ Model Settings / info:
108
+
109
+ ---
110
+
111
+ Max context: 32k.
112
+
113
+ Super special thanks to Qwen and Open-R1 for making such fantastic models.
114
+
115
+ <B>Suggested Settings: </B>
116
+ - Temp .5 to .7 (or lower)
117
+ - topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
118
+ - rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
119
+ - Jinja Template (embedded) or CHATML template.
120
+ - A System Prompt is not required. (ran tests with blank system prompt)
121
+
122
+ <B>System Prompt:</B>
123
+
124
+ If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.
125
+
126
+ This will cut down prompt size and focus the model.
127
+
128
+ <B>Activated Experts:</B>
129
+
130
+ Model default is set to 2 experts activated. It will run with one expert activated.
131
+
132
+ <B>Generation:</B>
133
+
134
+ Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.
135
+
136
+ This will give you a large selection of varied code to choose from.
137
+
138
+ I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).
139
+
140
+ These generation suggestions can create stronger, more compact code - and in some cases faster code too.
141
+
142
+ ---
143
+
144
+ For more information / other Qwen/Mistral Coders / additional settings see:
145
+
146
+ [ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]
147
+
148
+ [model card pending updates]
149
+
150
+ For settings, parameters and other details also see:
151
+
152
+ https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
153
+
154
+ and/or
155
+
156
+ https://huggingface.co/open-r1/OlympicCoder-32B
157
+
158
+ More to come...