jamescallander
/

Qwen2.5-Coder-3B-Instruct_w8a8_g128_rk3588.rkllm

@@ -1,5 +1,126 @@
----
-license: other
-license_name: qwen-research
-license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE
----

+---
+library_name: rkllm
+pipeline_tag: text-generation
+license: other
+license_name: qwen-research
+license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE
+base_model:
+- Qwen/Qwen2.5-Coder-3B-Instruct
+tags:
+- text-generation-inference
+- rkllm
+- rk3588
+- rockchip
+- edge-ai
+- qwen2
+- code
+- chat
+---
+# Qwen2.5-Coder-3B-Instruct — RKLLM build for RK3588 boards
+**Author:** @jamescallander
+**Source model:** [Qwen/Qwen2.5-Coder-3B-Instruct · Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
+**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
+> This repository hosts a **conversion** of `Qwen2.5-Coder-3B-Instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
+#### Conversion details
+- RKLLM-Toolkit version: v1.2.1
+- NPU driver: v0.9.8
+- Python: 3.12
+- Quantization: `w8a8_g128`
+- Output: single-file `.rkllm` artifact
+- Tokenizer: not required at runtime (UI handles prompt I/O)
+## ⚠️ Code generation disclaimer
+🛑 **This model may produce incorrect, insecure, or non-optimal code.**
+- It is intended for **research, educational, and prototyping purposes only**.
+- Always **review, test, and validate** any generated code before using it in production.
+- The model does not guarantee compliance with security best practices or coding standards.
+- You are responsible for ensuring outputs meet your project’s requirements and legal obligations.
+## Intended use
+- On-device deployment of a **coding-focused instruction model** for software development assistance on SBCs.
+- Qwen2.5-Coder-3B-Instruct is tuned for **code generation, explanation, and debugging tasks**, making it suitable for private edge inference.
+## Limitations
+- Requires 4GB free memory
+- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
+- Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
+## Quick start (RK3588)
+### 1) Install runtime
+The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
+Download and install the required packages as per the toolkit's instructions.
+### 2) Simple Flask server deployment
+The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
+```bash
+python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
+  --rkllm_model_path <MODEL_PATH>/Qwen2.5-Coder-3B-Instruct_w8a8_g128_rk3588.rkllm \
+  --target_platform rk3588
+```
+### 3) Sending a request
+A basic format for message request is:
+```json
+{
+    "model":"Qwen2.5-Coder-3B",
+    "messages":[{
+        "role":"user",
+        "content":"<YOUR_PROMPT_HERE>"}],
+    "stream":false
+}
+```
+Example request using `curl`:
+```bash
+curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
+    -H 'Content-Type: application/json' \
+    -d '{"model":"Qwen2.5-Coder-3B","messages":[{"role":"user","content":"Explain in one sentence  what a static method is."}],"stream":false}'
+```
+The response is formated in the following way:
+```json
+{
+    "choices":[{
+        "finish_reason":"stop",
+        "index":0,
+        "logprobs":null,
+        "message":{
+            "content":"<MODEL_REPLY_HERE">,
+            "role":"assistant"}}],
+        "created":null,
+        "id":"rkllm_chat",
+        "object":"rkllm_chat",
+        "usage":{
+            "completion_tokens":null,
+            "prompt_tokens":null,
+            "total_tokens":null}
+}
+```
+Example response:
+```json
+{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"A static method belongs to the class itself rather than any instance of the class and can be called without creating an object of the class.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
+```
+# License
+This conversion follows the license of the source model: [LICENSE · Qwen/Qwen2.5-Coder-3B-Instruct at main](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct/blob/main/LICENSE)