File size: 6,503 Bytes
525736b 5b43794 8a1ea37 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- code
- python
- maincoder
- code-generation
- reinforcement-learning
- mcpo
pipeline_tag: text-generation
base_model: Maincode/Maincoder-1B
---
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250">
[**Maincoder-1B**](https://maincode.com/maincoder/) is a code-focused language model optimized for code generation and completion tasks. The model achieves strong performance on coding benchmarks while maintaining a compact size suitable for local deployment.
# Key Features
- **Code Generation**: Optimized for Python code completion and generation tasks.
- **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware.
- **Deep Architecture**: Modern transformer architecture with RoPE embeddings, grouped-query attention, QK normalization and high depth-to-width ratio.
- **Advanced Data Mixing**: Pre-trained and mid-trained on custom data mixes developed for high-performance coding.
- **MCPO Algorithm**: Fine-tuned with specialised reinforcement learning policy optimisation algorithm to improve training stability and accelerate convergence.
- **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+.
# Benchmark Results
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050">
| Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K |
|---|---:|---:|---:|---:|---:|
| [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** | **0.7090** | 0.3054 | 0.2976 |
| [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 | 0.6217 | 0.2705 | 0.0413 |
| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 |
| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 |
| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** |
# Model Overview
Maincoder uses a modern transformer decoder architecture with:
- **Rotary Position Embeddings**: With theta of 1,000,000.
- **RMSNorm**: Pre-normalization for stable training.
- **Grouped Query Attention**: 4:1 ratio of query to key-value heads.
- **QK Normalization**: RMSNorm applied to attention queries and keys.
- **SwiGLU MLP**: Gated linear units with SiLU activation.
| Attribute | Value |
|-----------|-------|
| Parameters | 1B |
| Hidden Size | 1536 |
| Layers | 32 |
| Attention Heads | 16 (4 KV heads) |
| Head Dimension | 96 |
| Vocabulary Size | 151,936 |
| Context Length | 2,048 |
| Precision | bfloat16 |
# Usage
### Installation
```bash
pip install transformers torch
```
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Maincode/Maincoder-1B",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"Maincode/Maincoder-1B",
trust_remote_code=True,
)
# Code completion example
prompt = '''def fibonacci(n: int) -> int:
"""Return the n-th Fibonacci number."""
'''
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.2,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Code Completion
```python
# Function completion
prompt = '''def quicksort(arr: list) -> list:
"""Sort a list using the quicksort algorithm."""
'''
# Class completion
prompt = '''class BinarySearchTree:
"""A binary search tree implementation."""
def __init__(self):
'''
# Algorithm implementation
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple:
"""Find the shortest path using Dijkstra's algorithm.
Args:
graph: Adjacency list representation of the graph
start: Starting node
end: Target node
Returns:
Tuple of (distance, path)
"""
'''
```
# Additional Notes
## Reproducibility
<details>
<summary>Model evaluations were run on 8 AMD MI355X GPUs via the <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI</a> framework.</summary>
```bash
docker run --rm -it \
--device=/dev/kfd --device=/dev/dri --group-add=video \
--ipc=host --security-opt seccomp=unconfined \
-v $(pwd):/workspace -w /workspace \
-e HF_TOKEN \
-e PYTHONHASHSEED=0 \
-e TORCH_DETERMINISTIC=1 \
-e ROCBLAS_ATOMICS_MODE="0" \
-e MIOPEN_FIND_MODE="1" \
-e CUBLAS_WORKSPACE_CONFIG=":4096:8" \
-e HF_ALLOW_CODE_EVAL="1" \
rocm/pytorch:rocm7.1.1_ubuntu24.04_py3.12_pytorch_release_2.9.1 \
bash -c 'pip install "lm_eval[hf]" && \
accelerate launch -m lm_eval \
--model hf --model_args "pretrained=Maincode/Maincoder-1B,trust_remote_code=True,dtype=float32" \
--tasks humaneval,humaneval_plus,mbpp_plus,mmlu,gsm8k \
--device cuda:0 --batch_size 32 --seed 42 \
--confirm_run_unsafe_code'
```
</details>
## Limitations
- Context length limited to 2,048 tokens
- Primarily optimized for Python, performance may vary on other languages
- May generate code with bugs or security issues - always review generated code
<div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;">
<strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case.
</div>
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
## Citation
```bibtex
@misc{maincoder2025,
title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model},
author = {Maincode Team},
year = {2025},
organization = {Maincode},
howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}}
}
```
## Contact
For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com).
|