File size: 5,952 Bytes
b78ed69 3c6ff63 b78ed69 3c6ff63 b78ed69 fa85ab7 b78ed69 cdee095 b78ed69 3c6ff63 fa85ab7 b78ed69 3c6ff63 b78ed69 214c287 b78ed69 3c6ff63 b78ed69 3c6ff63 b78ed69 214c287 b78ed69 3c6ff63 b78ed69 3c6ff63 b78ed69 3c6ff63 214c287 3c6ff63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
base_model:
- stabilityai/stable-diffusion-2-1
datasets:
- manycore-research/SpatialGen-Testset
license: creativeml-openrail-m
pipeline_tag: image-to-3d
---
# SpatialGen: Layout-guided 3D Indoor Scene Generation
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<picture>
<source srcset="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/myrWYVNd4m-DuxV39VQZ0.png" media="(prefers-color-scheme: dark)">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/QQvDtmokH4ZjwH0wppqFC.png" width="60%" alt="SpatialLM""/>
</picture>
</div>
<hr style="margin-top: 0; margin-bottom: 8px;">
<div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
<a href="https://manycore-research.github.io/SpatialGen" target="_blank" style="margin: 2px;"><img alt="Project"
src="https://img.shields.io/badge/🌐%20Project-SpatialGen-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://arxiv.org/abs/2509.14981" target="_blank" style="margin: 2px;"><img alt="arXiv"
src="https://img.shields.io/badge/arXiv-SpatialGen-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub"
src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
</div>
<div align="center">
| Image-to-Scene Results | Text-to-Scene Results |
| :--------------------------------------: | :----------------------------------------: |
|  |  |
<p>TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.</p>
</div>
## ✨ News
- [Sep, 2025] We released the paper of SpatialGen!
- [Aug, 2025] Initial release of SpatialGen-1.0!
## 📋 Release Plan
- [x] Provide inference code of SpatialGen.
- [ ] Provide training instruction for SpatialGen.
- [ ] Release SpatialGen dataset.
## SpatialGen Models
<div align="center">
| **Model** | **Download** |
| :-----------------------: | -------------------------------------------------------------------------------------|
| SpatialGen-1.0 | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0) |
| FLUX.1-Layout-ControlNet | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) |
| FLUX.1-Wireframe-dev-lora | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Wireframe-dev-lora) |
</div>
## Usage
### 🔧 Installation
Tested with the following environment:
* Python 3.10
* PyTorch 2.3.1
* CUDA Version 12.1
```bash
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
```
### 📊 Dataset
We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.
### Inference
```bash
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh
# Text-to-image-to-3D Scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room,
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh
```
## License
[SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL). [FLUX.1-Layout-ControlNet](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) is licensed under the [FLUX.1-dev Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev).
## Acknowledgements
We would like to thank the following projects that made this work possible:
[DiffSplat](https://github.com/chenguolin/DiffSplat) | [SD 2.1](https://github.com/Stability-AI/stablediffusion) | [TAESD](https://github.com/madebyollin/taesd) | [FLUX](https://github.com/black-forest-labs/flux/) | [SpatialLM](https://github.com/manycore-research/SpatialLM)
## Citation
```bibtex
@article{SpatialGen,
title = {SpatialGen: Layout-guided 3D Indoor Scene Generation},
author = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping},
journal = {arXiv preprint},
year = {2025},
eprint = {2509.14981},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
``` |