File size: 5,046 Bytes
da9454d 285513a da9454d 014936e da9454d b0f5750 da9454d 8a70da4 a9d1486 8a70da4 95737df 8a70da4 da9454d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
library_name: rkllm
pipeline_tag: text-generation
license: other
license_name: qwen-research
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE
base_model:
- Qwen/Qwen2.5-Coder-3B-Instruct
tags:
- text-generation-inference
- rkllm
- rk3588
- rockchip
- edge-ai
- qwen2
- code
- chat
---
# Qwen2.5-Coder-3B-Instruct — RKLLM build for RK3588 boards
### Built with Qwen
**Author:** @jamescallander
**Source model:** [Qwen/Qwen2.5-Coder-3B-Instruct · Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
> This repository hosts a **conversion** of `Qwen2.5-Coder-3B-Instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
#### Conversion details
- RKLLM-Toolkit version: v1.2.1
- NPU driver: v0.9.8
- Python: 3.12
- Quantization: `w8a8_g128`
- Output: single-file `.rkllm` artifact
- Modifications: quantization (w8a8_g128), export to .rkllm format for RK3588 SBCs.
- Tokenizer: not required at runtime (UI handles prompt I/O)
## ⚠️ Code generation disclaimer
🛑 **This model may produce incorrect, insecure, or non-optimal code.**
- It is intended for **research, educational, and prototyping purposes only**.
- Always **review, test, and validate** any generated code before using it in production.
- The model does not guarantee compliance with security best practices or coding standards.
- You are responsible for ensuring outputs meet your project’s requirements and legal obligations.
## Intended use
- On-device deployment of a **coding-focused instruction model** for software development assistance on SBCs.
- Qwen2.5-Coder-3B-Instruct is tuned for **code generation, explanation, and debugging tasks**, making it suitable for private edge inference.
## Limitations
- Requires 4GB free memory
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
- Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
## Quick start (RK3588)
### 1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
Download and install the required packages as per the toolkit's instructions.
### 2) Simple Flask server deployment
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/Qwen2.5-Coder-3B-Instruct_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
```
### 3) Sending a request
A basic format for message request is:
```json
{
"model":"Qwen2.5-Coder-3B",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
```
Example request using `curl`:
```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"Qwen2.5-Coder-3B","messages":[{"role":"user","content":"Explain in one sentence what a static method is."}],"stream":false}'
```
The response is formated in the following way:
```json
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
```
Example response:
```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"A static method belongs to the class itself rather than any instance of the class and can be called without creating an object of the class.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```
### 4) UI compatibility
This server exposes an **OpenAI-compatible Chat Completions API**.
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:
```json
{
"model": "Qwen2.5-Coder-3B-Instruct",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
```
# License
This conversion follows the license of the source model: [LICENSE · Qwen/Qwen2.5-Coder-3B-Instruct at main](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE) |