LMD0311 commited on
Commit
0284994
·
verified ·
1 Parent(s): 182b4e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <div align="center">
6
+ <img src="./figures/logo.jpg" width = "150" align=center />
7
+ </div>
8
+
9
+ </div>
10
+
11
+ <div align="center">
12
+ <h3>HERMES: A Unified Self-Driving World Model for Simultaneous <br>3D Scene Understanding and Generation</h3>
13
+
14
+
15
+
16
+
17
+ [Xin Zhou](https://lmd0311.github.io/)<sup>1\*</sup>, [Dingkang Liang](https://dk-liang.github.io/)<sup>1\*†</sup>, Sifan Tu<sup>1</sup>, [Xiwu Chen](https://scholar.google.com/citations?user=PVMQa-IAAAAJ&hl=en)<sup>3</sup>, [Yikang Ding](https://scholar.google.com/citations?user=gdP9StQAAAAJ&hl=en)<sup>2†</sup>, Dingyuan Zhang<sup>1</sup>, Feiyang Tan<sup>3</sup>,<br> [Hengshuang Zhao](https://scholar.google.com/citations?user=4uE10I0AAAAJ&hl=en)<sup>4</sup>, [Xiang Bai](https://scholar.google.com/citations?user=UeltiQ4AAAAJ&hl=en)<sup>1</sup>
18
+
19
+ <sup>1</sup> Huazhong University of Science & Technology, <sup>2</sup> MEGVII Technology, <br><sup>3</sup> Mach Drive, <sup>4</sup> The University of Hong Kong
20
+
21
+ (\*) Equal contribution. (†) Project leader.
22
+
23
+ [![arXiv](https://img.shields.io/badge/Arxiv-2501.14729-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.14729)
24
+ [![Project](https://img.shields.io/badge/Homepage-project-orange.svg?logo=googlehome)](https://lmd0311.github.io/HERMES/)
25
+ [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
26
+
27
+ Check our *awesome* for the latest World Models! [![Awesome World Model](https://img.shields.io/badge/GitHub-awesome_world_model-blue?logo=github)](https://github.com/LMD0311/Awesome-World-Model)
28
+ ![Stars](https://img.shields.io/github/stars/LMD0311/Awesome-World-Model)
29
+
30
+
31
+
32
+ </div>
33
+
34
+ ## 📣 News
35
+ - **[2025.07.14]** Code, pretrained weights, and used processed data are now open-sourced.
36
+ - **[2025.06.26]** HERMES is accepted to **ICCV 2025**! 🥳
37
+ - **[2025.01.24]** Release the demo. Check it out and give it a star 🌟!
38
+ - **[2025.01.24]** Release the [paper](https://arxiv.org/abs/2501.14729).
39
+
40
+ <div align="center">
41
+ <img src="./figures/intro.png" width = "888" align=center />
42
+ </div>
43
+
44
+ ## Abstract
45
+
46
+ Driving World Models (DWMs) have become essential for autonomous driving by enabling future scene prediction. However, existing DWMs are limited to scene generation and fail to incorporate scene understanding, which involves interpreting and reasoning about the driving environment. In this paper, we present a unified Driving World Model named **HERMES**<sup>1</sup>. Through a unified framework, we seamlessly integrate scene understanding and future scene evolution (generation) in driving scenarios. Specifically, **HERMES** leverages a Bird‘s-Eye View (BEV) representation to consolidate multi-view spatial information while preserving geometric relationships and interactions. Additionally, we introduce world queries, which incorporate world knowledge into BEV features via causal attention in the Large Language Model (LLM), enabling contextual enrichment for both understanding and generation tasks. We conduct comprehensive studies on nuScenes and OmniDrive-nuScenes datasets to validate the effectiveness of our method. **HERMES** achieves state-of-the-art performance, reducing generation error by 32.4% and improving understanding metrics such as CIDEr by 8.0%.
47
+
48
+ ## Overview
49
+
50
+ <div align="center">
51
+ <img src="./figures/pipeline.jpg" width = "888" align=center />
52
+ </div>
53
+
54
+
55
+
56
+ ## Demo
57
+
58
+ <div align="center">
59
+ <img src="./figures/scene1.gif" width = "999" align=center />
60
+ <center> Example 1 </center> <br>
61
+ </div>
62
+
63
+ <div align="center">
64
+ <img src="./figures/scene2.gif" width = "999" align=center />
65
+ <center> Example 2 </center> <br>
66
+ </div>
67
+
68
+ <div align="center">
69
+ <img src="./figures/scene3.gif" width = "999" align=center />
70
+ <center> Example 3 </center> <br>
71
+ </div>
72
+
73
+
74
+ ## Main Results
75
+
76
+ <div align="center">
77
+ <img src="./figures/main_results.png" width = "888" align=center />
78
+ </div>
79
+
80
+ ## Acknowledgement
81
+
82
+ This project is based on BEVFormer v2 ([paper](https://arxiv.org/abs/2211.10439), [code](https://github.com/fundamentalvision/BEVFormer)), InternVL ([paper](https://arxiv.org/abs/2404.16821), [code](https://github.com/OpenGVLab/InternVL)), UniPAD ([paper](https://arxiv.org/abs/2310.08370), [code](https://github.com/Nightmare-n/UniPAD)), OminiDrive ([paper](https://arxiv.org/abs/2405.01533), [code](https://github.com/NVlabs/OmniDrive)), DriveMonkey ([paper](https://arxiv.org/abs/2505.08725), [code](https://github.com/zc-zhao/DriveMonkey)). Thanks for their wonderful works.
83
+
84
+ ## Citation
85
+
86
+ If you find this repository useful in your research, please consider giving a star ⭐ and a citation.
87
+ ```bibtex
88
+ @inproceedings{zhou2025hermes,
89
+ title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},
90
+ author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},
91
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
92
+ year={2025}
93
+ }
94
+ ```