|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-Coder-32B-Instruct |
|
- open-r1/OlympicCoder-32B |
|
pipeline_tag: text-generation |
|
tags: |
|
- merge |
|
- programming |
|
- code generation |
|
- code |
|
- codeqwen |
|
- moe |
|
- coding |
|
- coder |
|
- qwen2 |
|
- chat |
|
- qwen |
|
- qwen-coder |
|
- mixture of experts |
|
- qwen2moe |
|
- 2X32B Shared. |
|
- shared expert |
|
library_name: transformers |
|
--- |
|
|
|
[uploading...] |
|
|
|
<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.2</h2> |
|
|
|
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. |
|
|
|
The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration. |
|
|
|
The two best Coders in one that are stronger than the sum of their parts. |
|
|
|
Both models code together. |
|
|
|
Info about each model below, followed by settings/info on using this MOE model. |
|
|
|
--- |
|
|
|
# Qwen2.5-Coder-32B-Instruct |
|
|
|
## Introduction |
|
|
|
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: |
|
|
|
- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. |
|
- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. |
|
- **Long-context Support** up to 128K tokens. |
|
|
|
**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features: |
|
- Type: Causal Language Models |
|
- Training Stage: Pretraining & Post-training |
|
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
- Number of Parameters: 32.5B |
|
- Number of Paramaters (Non-Embedding): 31.0B |
|
- Number of Layers: 64 |
|
- Number of Attention Heads (GQA): 40 for Q and 8 for KV |
|
- Context Length: Full 131,072 tokens |
|
- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts. |
|
|
|
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/ |
|
|
|
and see also: |
|
|
|
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct |
|
|
|
--- |
|
|
|
# Model Card for OlympicCoder-32B |
|
|
|
OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics. |
|
|
|
* Repository: https://github.com/huggingface/open-r1 |
|
* Blog post: https://huggingface.co/blog/open-r1/update-3 |
|
|
|
## Model description |
|
|
|
- **Model type:** A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset. |
|
- **Language(s) (NLP):** Primarily English |
|
- **License:** apache-2.0 |
|
- **Finetuned from model:** [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
|
|
|
## Evaluation |
|
|
|
We compare the performance of OlympicCoder models on two main benchmarks for competitive coding: |
|
|
|
* **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem. |
|
* **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench). |
|
|
|
> [!NOTE] |
|
> The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python. |
|
|
|
|
|
For more info on this model, including benchmarks see: |
|
|
|
https://huggingface.co/open-r1/OlympicCoder-32B |
|
|
|
--- |
|
|
|
<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1</h2> |
|
|
|
Model Settings / info: |
|
|
|
--- |
|
|
|
Max context: 32k. |
|
|
|
Super special thanks to Qwen and Open-R1 for making such fantastic models. |
|
|
|
<B>Suggested Settings: </B> |
|
- Temp .5 to .7 (or lower) |
|
- topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05) |
|
- rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05) |
|
- Jinja Template (embedded) or CHATML template. |
|
- A System Prompt is not required. (ran tests with blank system prompt) |
|
|
|
<B>System Prompt:</B> |
|
|
|
If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions. |
|
|
|
This will cut down prompt size and focus the model. |
|
|
|
<B>Activated Experts:</B> |
|
|
|
Model default is set to 2 experts activated. It will run with one expert activated. |
|
|
|
<B>Generation:</B> |
|
|
|
Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated. |
|
|
|
This will give you a large selection of varied code to choose from. |
|
|
|
I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s). |
|
|
|
These generation suggestions can create stronger, more compact code - and in some cases faster code too. |
|
|
|
--- |
|
|
|
For more information / other Qwen/Mistral Coders / additional settings see: |
|
|
|
[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ] |
|
|
|
[model card pending updates] |
|
|
|
For settings, parameters and other details also see: |
|
|
|
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct |
|
|
|
and/or |
|
|
|
https://huggingface.co/open-r1/OlympicCoder-32B |
|
|
|
More to come... |