DavidAU commited on
Commit
012f2a8
·
verified ·
1 Parent(s): 44fa154

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -44,6 +44,46 @@ The two best Coders in one that are stronger than the sum of their parts.
44
 
45
  Both models code together.
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  Max context: 32k.
48
 
49
  Super special thanks to Qwen and Open-R1 for making such fantastic models.
 
44
 
45
  Both models code together.
46
 
47
+ Info about each model below, followed by settings/info on using this MOE model.
48
+
49
+ ---
50
+
51
+ # Qwen2.5-Coder-32B-Instruct
52
+ <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
53
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
54
+ </a>
55
+
56
+ ## Introduction
57
+
58
+ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
59
+
60
+ - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
61
+ - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
62
+ - **Long-context Support** up to 128K tokens.
63
+
64
+ **This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
65
+ - Type: Causal Language Models
66
+ - Training Stage: Pretraining & Post-training
67
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
68
+ - Number of Parameters: 32.5B
69
+ - Number of Paramaters (Non-Embedding): 31.0B
70
+ - Number of Layers: 64
71
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
72
+ - Context Length: Full 131,072 tokens
73
+ - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
74
+
75
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
76
+
77
+ ---
78
+
79
+ ---
80
+
81
+ ---
82
+
83
+ Model Settings / info:
84
+
85
+ ---
86
+
87
  Max context: 32k.
88
 
89
  Super special thanks to Qwen and Open-R1 for making such fantastic models.