File size: 4,907 Bytes
bda112f 457c0d0 bda112f 2e3ac9a 5f33573 2e3ac9a bda112f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
library_name: rkllm
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- openbmb/MiniCPM4-0.5B
tags:
- rkllm
- rk3588
- rockchip
- edge-ai
- llm
- MiniCPM4
- text-generation-inference
---
# MiniCPM4-0.5B — RKLLM build for RK3588 boards
**Author:** @jamescallander
**Source model:** [openbmb/MiniCPM4-0.5B · Hugging Face](https://huggingface.co/openbmb/MiniCPM4-0.5B)
**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
> This repository hosts a **conversion** of `MiniCPM4-0.5B` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
#### Conversion details
- **RKLLM-Toolkit version:** v1.2.1
- **NPU driver:** v0.9.8
- **Python:** 3.12
- **Quantization:** `w8a8_g128`
- **Output:** single-file `.rkllm` artifact
- **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs
- ****Tokenizer:**** not required at runtime (UI handles prompt I/O)
## Intended use
- On-device lightweight inference on RK3588 SBCs.
- MiniCPM4-0.5B is a **compact general-purpose model** designed for efficiency, testing, and resource-constrained scenarios. Ideal for experimentation where low memory usage and fast response matter more than deep reasoning.
## Limitations
- Requires 700MB free memory
- As a 0.5B parameter model, it has **limited reasoning ability** compared to larger LLMs (e.g., 7B/8B).
- Tested on a Radxa Rock 5B+, Orange Pi 5 plus; other devices may require different drivers/toolkit versions.
- Quantization (`w8a8_g128`) may further reduce output fidelity.
- Best suited for **basic Q&A, toy chat, or edge demos** rather than production-level tasks.
## Quick start (RK3588)
### 1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
Download and install the required packages as per the toolkit's instructions.
### 2) Simple Flask server deployment
The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/MiniCPM4-0.5B_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
```
### 3) Sending a request
A basic format for message request is:
```json
{
"model":"MiniCPM4-0.5B",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
```
Example request using `curl`:
```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"MiniCPM4-0.5B","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}'
```
The response is formated in the following way:
```json
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
```
Example response:
```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military leader and statesman who rose to prominence during the French Revolution. He played a pivotal role in shaping modern Europe through his military campaigns, administrative reforms, and the establishment of new political institutions.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```
### 4) UI compatibility
This server exposes an **OpenAI-compatible Chat Completions API**.
You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:
```json
{
"model": "MiniCPM4-0.5B",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
```
# License
This conversion follows the license of the source model: [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
- Attribution: **Built with MiniCPM4 (OpenBMB)**
- Required notice: see [`NOTICE`](NOTICE)
- Modifications: quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs |