DavidAU
/

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-v1.1

Model card Files Files and versions

DavidAU commited on Jul 9

Commit

012f2a8

·

verified ·

1 Parent(s): 44fa154

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -44,6 +44,46 @@ The two best Coders in one that are stronger than the sum of their parts.
 Both models code together.
 Max context: 32k.
 Super special thanks to Qwen and Open-R1 for making such fantastic models.

 Both models code together.
+Info about each model below, followed by settings/info on using this MOE model.
+---
+# Qwen2.5-Coder-32B-Instruct
+<a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
+</a>
+## Introduction
+Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
+- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
+- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
+- **Long-context Support** up to 128K tokens.
+**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
+- Type: Causal Language Models
+- Training Stage: Pretraining & Post-training
+- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+- Number of Parameters: 32.5B
+- Number of Paramaters (Non-Embedding): 31.0B
+- Number of Layers: 64
+- Number of Attention Heads (GQA): 40 for Q and 8 for KV
+- Context Length: Full 131,072 tokens
+  - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
+For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
+---
+---
+---
+Model Settings / info:
+---
 Max context: 32k.
 Super special thanks to Qwen and Open-R1 for making such fantastic models.