File size: 4,907 Bytes
bda112f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
457c0d0
bda112f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e3ac9a
 
 
 
 
 
 
 
 
 
5f33573
2e3ac9a
 
 
 
 
bda112f
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
library_name: rkllm
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- openbmb/MiniCPM4-0.5B
tags:
- rkllm
- rk3588
- rockchip
- edge-ai
- llm
- MiniCPM4
- text-generation-inference
---
# MiniCPM4-0.5B — RKLLM build for RK3588 boards

**Author:** @jamescallander  
**Source model:** [openbmb/MiniCPM4-0.5B · Hugging Face](https://huggingface.co/openbmb/MiniCPM4-0.5B)

**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**

> This repository hosts a **conversion** of `MiniCPM4-0.5B` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)

#### Conversion details

- **RKLLM-Toolkit version:** v1.2.1
- **NPU driver:** v0.9.8
- **Python:** 3.12
- **Quantization:** `w8a8_g128`
- **Output:** single-file `.rkllm` artifact
- **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs
- ****Tokenizer:**** not required at runtime (UI handles prompt I/O)

## Intended use

- On-device lightweight inference on RK3588 SBCs.
- MiniCPM4-0.5B is a **compact general-purpose model** designed for efficiency, testing, and resource-constrained scenarios. Ideal for experimentation where low memory usage and fast response matter more than deep reasoning.

## Limitations

- Requires 700MB free memory
- As a 0.5B parameter model, it has **limited reasoning ability** compared to larger LLMs (e.g., 7B/8B).
- Tested on a Radxa Rock 5B+, Orange Pi 5 plus; other devices may require different drivers/toolkit versions.
- Quantization (`w8a8_g128`) may further reduce output fidelity.
- Best suited for **basic Q&A, toy chat, or edge demos** rather than production-level tasks.

## Quick start (RK3588)

### 1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).

Download and install the required packages as per the toolkit's instructions.

### 2) Simple Flask server deployment

The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`

```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/MiniCPM4-0.5B_w8a8_g128_rk3588.rkllm \
  --target_platform rk3588
```

### 3) Sending a request

A basic format for message request is:

```json
{
    "model":"MiniCPM4-0.5B",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}
```

Example request using `curl`:

```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"MiniCPM4-0.5B","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}'
```

The response is formated in the following way:

```json
{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE">,
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}
```

Example response:

```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military leader and statesman who rose to prominence during the French Revolution. He played a pivotal role in shaping modern Europe through his military campaigns, administrative reforms, and the establishment of new political institutions.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```

### 4) UI compatibility

This server exposes an **OpenAI-compatible Chat Completions API**.

You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:

```json
{
 "model": "MiniCPM4-0.5B",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}
```

# License

This conversion follows the license of the source model: [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)

- Attribution: **Built with MiniCPM4 (OpenBMB)**
- Required notice: see [`NOTICE`](NOTICE)
- Modifications: quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs