Mizukiluke commited on
Commit
93da68b
·
verified ·
1 Parent(s): f1e7675

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-VL-32B-Instruct
7
+ ---
8
+
9
+ # GUI-Owl
10
+
11
+ <div align="center">
12
+ <img src=https://youke1.picui.cn/s1/2025/08/18/68a2f82fef3d4.png width="40%"/>
13
+ </div>
14
+
15
+ GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
16
+
17
+ * **Paper**:
18
+ * **GitHub Repository**: https://github.com/X-PLUG/MobileAgent
19
+ * **Online Demo**: Comming soon
20
+
21
+ ## Performance
22
+
23
+ ### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
24
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/>
25
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/>
26
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/>
27
+
28
+ ### MMBench-GUI L1, L2 and Android Control
29
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l1.jpg?raw=true" width="80%"/>
30
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l2.jpg?raw=true" width="80%"/>
31
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/android_control.jpg?raw=true" width="40%"/>
32
+
33
+ ### Android World and OSWorld-Verified
34
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="40%"/>
35
+
36
+ ## Usage
37
+
38
+ Please refer to our cookbook.
39
+
40
+ ## Deploy
41
+
42
+ We recommand deploy GUI-Owl-32B through vllm
43
+
44
+ This script has been validated on an A100 with 96 GB of VRAM. If you serve GUI-Owl-32B on an H20-3e, you can set MP_SIZE=1 for faster inference speed.
45
+ ```bash
46
+ PIXEL_ARGS='{"min_pixels":3136,"max_pixels":10035200}'
47
+ IMAGE_LIMIT_ARGS='image=2'
48
+ MP_SIZE=2
49
+ MM_KWARGS=(
50
+ --mm-processor-kwargs $PIXEL_ARGS
51
+ --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
52
+ )
53
+
54
+ vllm serve $CKPT \
55
+ --max-model-len 32768 ${MM_KWARGS[@]} \
56
+ --tensor-parallel-size $MP_SIZE \
57
+ --allowed-local-media-path '/' \
58
+ --port 4243
59
+ ```
60
+
61
+ If you want GUI-Owl to recieve more than two images, you could increase `IMAGE_LIMIT_ARGS` and reduce `max_pixels`.
62
+
63
+ For example:
64
+ ```bash
65
+ PIXEL_ARGS='{"min_pixels":3136,"max_pixels":3211264}'
66
+ IMAGE_LIMIT_ARGS='image=5'
67
+ MP_SIZE=2
68
+ MM_KWARGS=(
69
+ --mm-processor-kwargs $PIXEL_ARGS
70
+ --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
71
+ )
72
+
73
+ vllm serve $CKPT \
74
+ --max-model-len 32768 ${MM_KWARGS[@]} \
75
+ --tensor-parallel-size $MP_SIZE \
76
+ --allowed-local-media-path '/' \
77
+ --port 4243
78
+ ```