Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- Qwen/Qwen2.5-Coder-32B-Instruct
|
7 |
+
- open-r1/OlympicCoder-32B
|
8 |
+
pipeline_tag: text-generation
|
9 |
+
tags:
|
10 |
+
- merge
|
11 |
+
- programming
|
12 |
+
- code generation
|
13 |
+
- code
|
14 |
+
- codeqwen
|
15 |
+
- moe
|
16 |
+
- coding
|
17 |
+
- coder
|
18 |
+
- qwen2
|
19 |
+
- chat
|
20 |
+
- qwen
|
21 |
+
- qwen-coder
|
22 |
+
- mixture of experts
|
23 |
+
- qwen2moe
|
24 |
+
- 2X32B Shared.
|
25 |
+
- shared expert
|
26 |
+
library_name: transformers
|
27 |
+
---
|
28 |
+
|
29 |
+
[uploading...]
|
30 |
+
|
31 |
+
<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.2</h2>
|
32 |
+
|
33 |
+
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
|
34 |
+
|
35 |
+
The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.
|
36 |
+
|
37 |
+
The two best Coders in one that are stronger than the sum of their parts.
|
38 |
+
|
39 |
+
Both models code together.
|
40 |
+
|
41 |
+
Info about each model below, followed by settings/info on using this MOE model.
|
42 |
+
|
43 |
+
---
|
44 |
+
|
45 |
+
# Qwen2.5-Coder-32B-Instruct
|
46 |
+
|
47 |
+
## Introduction
|
48 |
+
|
49 |
+
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
|
50 |
+
|
51 |
+
- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
|
52 |
+
- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
|
53 |
+
- **Long-context Support** up to 128K tokens.
|
54 |
+
|
55 |
+
**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
|
56 |
+
- Type: Causal Language Models
|
57 |
+
- Training Stage: Pretraining & Post-training
|
58 |
+
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
59 |
+
- Number of Parameters: 32.5B
|
60 |
+
- Number of Paramaters (Non-Embedding): 31.0B
|
61 |
+
- Number of Layers: 64
|
62 |
+
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
63 |
+
- Context Length: Full 131,072 tokens
|
64 |
+
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
|
65 |
+
|
66 |
+
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
|
67 |
+
|
68 |
+
and see also:
|
69 |
+
|
70 |
+
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
|
71 |
+
|
72 |
+
---
|
73 |
+
|
74 |
+
# Model Card for OlympicCoder-32B
|
75 |
+
|
76 |
+
OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.
|
77 |
+
|
78 |
+
* Repository: https://github.com/huggingface/open-r1
|
79 |
+
* Blog post: https://huggingface.co/blog/open-r1/update-3
|
80 |
+
|
81 |
+
## Model description
|
82 |
+
|
83 |
+
- **Model type:** A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
|
84 |
+
- **Language(s) (NLP):** Primarily English
|
85 |
+
- **License:** apache-2.0
|
86 |
+
- **Finetuned from model:** [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)
|
87 |
+
|
88 |
+
## Evaluation
|
89 |
+
|
90 |
+
We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:
|
91 |
+
|
92 |
+
* **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
|
93 |
+
* **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).
|
94 |
+
|
95 |
+
> [!NOTE]
|
96 |
+
> The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.
|
97 |
+
|
98 |
+
|
99 |
+
For more info on this model, including benchmarks see:
|
100 |
+
|
101 |
+
https://huggingface.co/open-r1/OlympicCoder-32B
|
102 |
+
|
103 |
+
---
|
104 |
+
|
105 |
+
<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1</h2>
|
106 |
+
|
107 |
+
Model Settings / info:
|
108 |
+
|
109 |
+
---
|
110 |
+
|
111 |
+
Max context: 32k.
|
112 |
+
|
113 |
+
Super special thanks to Qwen and Open-R1 for making such fantastic models.
|
114 |
+
|
115 |
+
<B>Suggested Settings: </B>
|
116 |
+
- Temp .5 to .7 (or lower)
|
117 |
+
- topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
|
118 |
+
- rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
|
119 |
+
- Jinja Template (embedded) or CHATML template.
|
120 |
+
- A System Prompt is not required. (ran tests with blank system prompt)
|
121 |
+
|
122 |
+
<B>System Prompt:</B>
|
123 |
+
|
124 |
+
If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.
|
125 |
+
|
126 |
+
This will cut down prompt size and focus the model.
|
127 |
+
|
128 |
+
<B>Activated Experts:</B>
|
129 |
+
|
130 |
+
Model default is set to 2 experts activated. It will run with one expert activated.
|
131 |
+
|
132 |
+
<B>Generation:</B>
|
133 |
+
|
134 |
+
Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.
|
135 |
+
|
136 |
+
This will give you a large selection of varied code to choose from.
|
137 |
+
|
138 |
+
I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).
|
139 |
+
|
140 |
+
These generation suggestions can create stronger, more compact code - and in some cases faster code too.
|
141 |
+
|
142 |
+
---
|
143 |
+
|
144 |
+
For more information / other Qwen/Mistral Coders / additional settings see:
|
145 |
+
|
146 |
+
[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]
|
147 |
+
|
148 |
+
[model card pending updates]
|
149 |
+
|
150 |
+
For settings, parameters and other details also see:
|
151 |
+
|
152 |
+
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
|
153 |
+
|
154 |
+
and/or
|
155 |
+
|
156 |
+
https://huggingface.co/open-r1/OlympicCoder-32B
|
157 |
+
|
158 |
+
More to come...
|