DavidAU
/

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-v1.2

Model card Files Files and versions

xet

Community

DavidAU commited on Jul 9

Commit

6f5fa78

verified ·

1 Parent(s): 203c78a

Create README.md

Browse files

Files changed (1) hide show

README.md +158 -0

README.md ADDED Viewed

	@@ -0,0 +1,158 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-Coder-32B-Instruct
+- open-r1/OlympicCoder-32B
+pipeline_tag: text-generation
+tags:
+- merge
+- programming
+- code generation
+- code
+- codeqwen
+- moe
+- coding
+- coder
+- qwen2
+- chat
+- qwen
+- qwen-coder
+- mixture of experts
+- qwen2moe
+- 2X32B Shared.
+- shared expert
+library_name: transformers
+---
+[uploading...]
+<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.2</h2>
+This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
+The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.
+The two best Coders in one that are stronger than the sum of their parts.
+Both models code together.
+Info about each model below, followed by settings/info on using this MOE model.
+---
+# Qwen2.5-Coder-32B-Instruct
+## Introduction
+Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
+- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
+- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
+- **Long-context Support** up to 128K tokens.
+**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:
+- Type: Causal Language Models
+- Training Stage: Pretraining & Post-training
+- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+- Number of Parameters: 32.5B
+- Number of Paramaters (Non-Embedding): 31.0B
+- Number of Layers: 64
+- Number of Attention Heads (GQA): 40 for Q and 8 for KV
+- Context Length: Full 131,072 tokens
+  - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
+For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/
+and see also:
+https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
+---
+# Model Card for OlympicCoder-32B
+OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.
+* Repository: https://github.com/huggingface/open-r1
+* Blog post: https://huggingface.co/blog/open-r1/update-3
+## Model description
+- **Model type:** A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
+- **Language(s) (NLP):** Primarily English
+- **License:** apache-2.0
+- **Finetuned from model:** [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)
+## Evaluation
+We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:
+* **[IOI'2024:](https://github.com/huggingface/ioi)** 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
+* **[LiveCodeBench:](https://livecodebench.github.io)** Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).
+> [!NOTE]
+> The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.
+For more info on this model, including benchmarks see:
+https://huggingface.co/open-r1/OlympicCoder-32B
+---
+<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1</h2>
+Model Settings / info:
+---
+Max context: 32k.
+Super special thanks to Qwen and Open-R1 for making such fantastic models.
+<B>Suggested Settings: </B>
+- Temp .5 to .7 (or lower)
+- topk: 20, topp: .8, minp: .05  (topp, minp can be .95 and .05)
+- rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
+- Jinja Template (embedded) or CHATML template.
+- A System Prompt is not required. (ran tests with blank system prompt)
+<B>System Prompt:</B>
+If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.
+This will cut down prompt size and focus the model.
+<B>Activated Experts:</B>
+Model default is set to 2 experts activated. It will run with one expert activated.
+<B>Generation:</B>
+Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.
+This will give you a large selection of varied code to choose from.
+I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).
+These generation suggestions can create stronger, more compact code - and in some cases faster code too.
+---
+For more information / other Qwen/Mistral Coders / additional settings see:
+[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]
+[model card pending updates]
+For settings, parameters and other details also see:
+https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
+and/or
+https://huggingface.co/open-r1/OlympicCoder-32B
+More to come...