Create README.md

6f5fa78 verified 3 months ago

6.16 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-Coder-32B-Instruct
	- open-r1/OlympicCoder-32B
	pipeline_tag: text-generation
	tags:
	- merge
	- programming
	- code generation
	- code
	- codeqwen
	- moe
	- coding
	- coder
	- qwen2
	- chat
	- qwen
	- qwen-coder
	- mixture of experts
	- qwen2moe
	- 2X32B Shared.
	- shared expert
	library_name: transformers
	---

	[uploading...]

	<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.2</h2>

	This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

	The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.

	The two best Coders in one that are stronger than the sum of their parts.

	Both models code together.

	Info about each model below, followed by settings/info on using this MOE model.

	---

	# Qwen2.5-Coder-32B-Instruct

	## Introduction

	Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

	- Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
	- A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
	- Long-context Support up to 128K tokens.

	This repo contains the instruction-tuned 32B Qwen2.5-Coder model, which has the following features:
	- Type: Causal Language Models
	- Training Stage: Pretraining & Post-training
	- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
	- Number of Parameters: 32.5B
	- Number of Paramaters (Non-Embedding): 31.0B
	- Number of Layers: 64
	- Number of Attention Heads (GQA): 40 for Q and 8 for KV
	- Context Length: Full 131,072 tokens
	- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.

	For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/

	and see also:

	https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

	---

	# Model Card for OlympicCoder-32B

	OlympicCoder-32B is a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.

	* Repository: https://github.com/huggingface/open-r1
	* Blog post: https://huggingface.co/blog/open-r1/update-3

	## Model description

	- Model type: A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
	- Language(s) (NLP): Primarily English
	- License: apache-2.0
	- Finetuned from model: [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)

	## Evaluation

	We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:

	* [IOI'2024:](https://github.com/huggingface/ioi) 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
	* [LiveCodeBench:](https://livecodebench.github.io) Python programming problems source from platforms like CodeForces and LeetCoder. We use the `v4_v5` subset of [`livecodebench/code_generation_lite`](https://huggingface.co/datasets/livecodebench/code_generation_lite), which corresponds to 268 problems. We use `lighteval` to evaluate models on LiveCodeBench using the sampling parameters described [here](https://github.com/huggingface/open-r1?tab=readme-ov-file#livecodebench).

	> [!NOTE]
	> The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially _out-of-domain_, since this expects models to output solutions in Python.


	For more info on this model, including benchmarks see:

	https://huggingface.co/open-r1/OlympicCoder-32B

	---

	<h2>Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1</h2>

	Model Settings / info:

	---

	Max context: 32k.

	Super special thanks to Qwen and Open-R1 for making such fantastic models.

	<B>Suggested Settings: </B>
	- Temp .5 to .7 (or lower)
	- topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
	- rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
	- Jinja Template (embedded) or CHATML template.
	- A System Prompt is not required. (ran tests with blank system prompt)

	<B>System Prompt:</B>

	If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.

	This will cut down prompt size and focus the model.

	<B>Activated Experts:</B>

	Model default is set to 2 experts activated. It will run with one expert activated.

	<B>Generation:</B>

	Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.

	This will give you a large selection of varied code to choose from.

	I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).

	These generation suggestions can create stronger, more compact code - and in some cases faster code too.

	---

	For more information / other Qwen/Mistral Coders / additional settings see:

	[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]

	[model card pending updates]

	For settings, parameters and other details also see:

	https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

	and/or

	https://huggingface.co/open-r1/OlympicCoder-32B

	More to come...