Update README.md
Browse files
README.md
CHANGED
@@ -44,6 +44,46 @@ The two best Coders in one that are stronger than the sum of their parts.
|
|
44 |
|
45 |
Both models code together.
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
Max context: 32k.
|
48 |
|
49 |
Super special thanks to Qwen and Open-R1 for making such fantastic models.
|
|
|
44 |
|
45 |
Both models code together.
|
46 |
|
47 |
+
Info about each model below, followed by settings/info on using this MOE model.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
# Qwen2.5-Coder-32B-Instruct
|
52 |
+
<a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
|
53 |
+
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
54 |
+
</a>
|
55 |
+
|
56 |
+
## Introduction
|
57 |
+
|
58 |
+
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
|
59 |
+
|
60 |
+
- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
|
61 |
+
- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
|
62 |
+
- **Long-context Support** up to 128K tokens.
|
63 |
+
|
64 |
+
**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
|
65 |
+
- Type: Causal Language Models
|
66 |
+
- Training Stage: Pretraining & Post-training
|
67 |
+
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
68 |
+
- Number of Parameters: 32.5B
|
69 |
+
- Number of Paramaters (Non-Embedding): 31.0B
|
70 |
+
- Number of Layers: 64
|
71 |
+
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
72 |
+
- Context Length: Full 131,072 tokens
|
73 |
+
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
|
74 |
+
|
75 |
+
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
|
76 |
+
|
77 |
+
---
|
78 |
+
|
79 |
+
---
|
80 |
+
|
81 |
+
---
|
82 |
+
|
83 |
+
Model Settings / info:
|
84 |
+
|
85 |
+
---
|
86 |
+
|
87 |
Max context: 32k.
|
88 |
|
89 |
Super special thanks to Qwen and Open-R1 for making such fantastic models.
|