jamescallander commited on
Commit
da9454d
·
verified ·
1 Parent(s): 1bc5c04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -5
README.md CHANGED
@@ -1,5 +1,126 @@
1
- ---
2
- license: other
3
- license_name: qwen-research
4
- license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: rkllm
3
+ pipeline_tag: text-generation
4
+ license: other
5
+ license_name: qwen-research
6
+ license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE
7
+ base_model:
8
+ - Qwen/Qwen2.5-Coder-3B-Instruct
9
+ tags:
10
+ - text-generation-inference
11
+ - rkllm
12
+ - rk3588
13
+ - rockchip
14
+ - edge-ai
15
+ - qwen2
16
+ - code
17
+ - chat
18
+ ---
19
+ # Qwen2.5-Coder-3B-Instruct — RKLLM build for RK3588 boards
20
+
21
+ **Author:** @jamescallander
22
+ **Source model:** [Qwen/Qwen2.5-Coder-3B-Instruct · Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
23
+ **Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
24
+
25
+ > This repository hosts a **conversion** of `Qwen2.5-Coder-3B-Instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
26
+
27
+ #### Conversion details
28
+
29
+ - RKLLM-Toolkit version: v1.2.1
30
+ - NPU driver: v0.9.8
31
+ - Python: 3.12
32
+ - Quantization: `w8a8_g128`
33
+ - Output: single-file `.rkllm` artifact
34
+ - Tokenizer: not required at runtime (UI handles prompt I/O)
35
+
36
+ ## ⚠️ Code generation disclaimer
37
+
38
+ 🛑 **This model may produce incorrect, insecure, or non-optimal code.**
39
+
40
+ - It is intended for **research, educational, and prototyping purposes only**.
41
+
42
+ - Always **review, test, and validate** any generated code before using it in production.
43
+ - The model does not guarantee compliance with security best practices or coding standards.
44
+ - You are responsible for ensuring outputs meet your project’s requirements and legal obligations.
45
+
46
+ ## Intended use
47
+
48
+ - On-device deployment of a **coding-focused instruction model** for software development assistance on SBCs.
49
+ - Qwen2.5-Coder-3B-Instruct is tuned for **code generation, explanation, and debugging tasks**, making it suitable for private edge inference.
50
+
51
+ ## Limitations
52
+
53
+ - Requires 4GB free memory
54
+ - Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
55
+ - Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
56
+
57
+ ## Quick start (RK3588)
58
+
59
+ ### 1) Install runtime
60
+
61
+ The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
62
+
63
+ Download and install the required packages as per the toolkit's instructions.
64
+
65
+ ### 2) Simple Flask server deployment
66
+
67
+ The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
68
+
69
+ ```bash
70
+ python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
71
+ --rkllm_model_path <MODEL_PATH>/Qwen2.5-Coder-3B-Instruct_w8a8_g128_rk3588.rkllm \
72
+ --target_platform rk3588
73
+ ```
74
+
75
+ ### 3) Sending a request
76
+
77
+ A basic format for message request is:
78
+
79
+ ```json
80
+ {
81
+ "model":"Qwen2.5-Coder-3B",
82
+ "messages":[{
83
+ "role":"user",
84
+ "content":"<YOUR_PROMPT_HERE>"}],
85
+ "stream":false
86
+ }
87
+ ```
88
+
89
+ Example request using `curl`:
90
+
91
+ ```bash
92
+ curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
93
+ -H 'Content-Type: application/json' \
94
+ -d '{"model":"Qwen2.5-Coder-3B","messages":[{"role":"user","content":"Explain in one sentence what a static method is."}],"stream":false}'
95
+ ```
96
+
97
+ The response is formated in the following way:
98
+
99
+ ```json
100
+ {
101
+ "choices":[{
102
+ "finish_reason":"stop",
103
+ "index":0,
104
+ "logprobs":null,
105
+ "message":{
106
+ "content":"<MODEL_REPLY_HERE">,
107
+ "role":"assistant"}}],
108
+ "created":null,
109
+ "id":"rkllm_chat",
110
+ "object":"rkllm_chat",
111
+ "usage":{
112
+ "completion_tokens":null,
113
+ "prompt_tokens":null,
114
+ "total_tokens":null}
115
+ }
116
+ ```
117
+
118
+ Example response:
119
+
120
+ ```json
121
+ {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"A static method belongs to the class itself rather than any instance of the class and can be called without creating an object of the class.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
122
+ ```
123
+
124
+ # License
125
+
126
+ This conversion follows the license of the source model: [LICENSE · Qwen/Qwen2.5-Coder-3B-Instruct at main](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE)