|
--- |
|
library_name: rkllm |
|
pipeline_tag: text-generation |
|
license: other |
|
license_name: qwen-research |
|
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE |
|
base_model: |
|
- Qwen/Qwen2.5-Coder-3B-Instruct |
|
tags: |
|
- text-generation-inference |
|
- rkllm |
|
- rk3588 |
|
- rockchip |
|
- edge-ai |
|
- qwen2 |
|
- code |
|
- chat |
|
--- |
|
# Qwen2.5-Coder-3B-Instruct — RKLLM build for RK3588 boards |
|
|
|
### Built with Qwen |
|
|
|
**Author:** @jamescallander |
|
**Source model:** [Qwen/Qwen2.5-Coder-3B-Instruct · Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct) |
|
|
|
**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime** |
|
|
|
> This repository hosts a **conversion** of `Qwen2.5-Coder-3B-Instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com) |
|
|
|
#### Conversion details |
|
|
|
- RKLLM-Toolkit version: v1.2.1 |
|
- NPU driver: v0.9.8 |
|
- Python: 3.12 |
|
- Quantization: `w8a8_g128` |
|
- Output: single-file `.rkllm` artifact |
|
- Modifications: quantization (w8a8_g128), export to .rkllm format for RK3588 SBCs. |
|
- Tokenizer: not required at runtime (UI handles prompt I/O) |
|
|
|
## ⚠️ Code generation disclaimer |
|
|
|
🛑 **This model may produce incorrect, insecure, or non-optimal code.** |
|
|
|
- It is intended for **research, educational, and prototyping purposes only**. |
|
|
|
- Always **review, test, and validate** any generated code before using it in production. |
|
- The model does not guarantee compliance with security best practices or coding standards. |
|
- You are responsible for ensuring outputs meet your project’s requirements and legal obligations. |
|
|
|
## Intended use |
|
|
|
- On-device deployment of a **coding-focused instruction model** for software development assistance on SBCs. |
|
- Qwen2.5-Coder-3B-Instruct is tuned for **code generation, explanation, and debugging tasks**, making it suitable for private edge inference. |
|
|
|
## Limitations |
|
|
|
- Requires 4GB free memory |
|
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream. |
|
- Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions. |
|
|
|
## Quick start (RK3588) |
|
|
|
### 1) Install runtime |
|
|
|
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip). |
|
|
|
Download and install the required packages as per the toolkit's instructions. |
|
|
|
### 2) Simple Flask server deployment |
|
|
|
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo` |
|
|
|
```bash |
|
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \ |
|
--rkllm_model_path <MODEL_PATH>/Qwen2.5-Coder-3B-Instruct_w8a8_g128_rk3588.rkllm \ |
|
--target_platform rk3588 |
|
``` |
|
|
|
### 3) Sending a request |
|
|
|
A basic format for message request is: |
|
|
|
```json |
|
{ |
|
"model":"Qwen2.5-Coder-3B", |
|
"messages":[{ |
|
"role":"user", |
|
"content":"<YOUR_PROMPT_HERE>"}], |
|
"stream":false |
|
} |
|
``` |
|
|
|
Example request using `curl`: |
|
|
|
```bash |
|
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \ |
|
-H 'Content-Type: application/json' \ |
|
-d '{"model":"Qwen2.5-Coder-3B","messages":[{"role":"user","content":"Explain in one sentence what a static method is."}],"stream":false}' |
|
``` |
|
|
|
The response is formated in the following way: |
|
|
|
```json |
|
{ |
|
"choices":[{ |
|
"finish_reason":"stop", |
|
"index":0, |
|
"logprobs":null, |
|
"message":{ |
|
"content":"<MODEL_REPLY_HERE">, |
|
"role":"assistant"}}], |
|
"created":null, |
|
"id":"rkllm_chat", |
|
"object":"rkllm_chat", |
|
"usage":{ |
|
"completion_tokens":null, |
|
"prompt_tokens":null, |
|
"total_tokens":null} |
|
} |
|
``` |
|
|
|
Example response: |
|
|
|
```json |
|
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"A static method belongs to the class itself rather than any instance of the class and can be called without creating an object of the class.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}} |
|
``` |
|
|
|
### 4) UI compatibility |
|
|
|
This server exposes an **OpenAI-compatible Chat Completions API**. |
|
|
|
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com)) |
|
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat` |
|
- Make sure the `model` field matches the converted model’s name, for example: |
|
|
|
```json |
|
{ |
|
"model": "Qwen2.5-Coder-3B-Instruct", |
|
"messages": [{"role":"user","content":"Hello!"}], |
|
"stream": false |
|
} |
|
``` |
|
|
|
# License |
|
|
|
This conversion follows the license of the source model: [LICENSE · Qwen/Qwen2.5-Coder-3B-Instruct at main](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE) |