Image-to-3D
Diffusers
Safetensors
SpatialGenDiffusionPipeline
File size: 5,952 Bytes
b78ed69
 
 
3c6ff63
 
 
 
b78ed69
3c6ff63
 
b78ed69
 
 
 
 
 
fa85ab7
 
 
 
b78ed69
 
 
cdee095
 
 
 
b78ed69
 
 
 
 
 
 
 
3c6ff63
 
fa85ab7
b78ed69
3c6ff63
b78ed69
 
 
 
214c287
b78ed69
3c6ff63
 
b78ed69
3c6ff63
 
 
b78ed69
 
 
 
 
214c287
 
 
 
 
b78ed69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c6ff63
 
b78ed69
 
 
 
 
3c6ff63
b78ed69
 
 
 
 
3c6ff63
 
 
 
 
214c287
 
 
 
 
 
 
 
3c6ff63
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
base_model:
- stabilityai/stable-diffusion-2-1
datasets:
- manycore-research/SpatialGen-Testset
license: creativeml-openrail-m
pipeline_tag: image-to-3d
---

# SpatialGen: Layout-guided 3D Indoor Scene Generation

<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->

<div align="center">
  <picture>
    <source srcset="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/myrWYVNd4m-DuxV39VQZ0.png" media="(prefers-color-scheme: dark)">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/QQvDtmokH4ZjwH0wppqFC.png" width="60%" alt="SpatialLM""/>
  </picture>
</div>
<hr style="margin-top: 0; margin-bottom: 8px;">
<div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
    <a href="https://manycore-research.github.io/SpatialGen" target="_blank" style="margin: 2px;"><img alt="Project"
    src="https://img.shields.io/badge/🌐%20Project-SpatialGen-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://arxiv.org/abs/2509.14981" target="_blank" style="margin: 2px;"><img alt="arXiv"
    src="https://img.shields.io/badge/arXiv-SpatialGen-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub"
    src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
    src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
</div>

<div align="center">

| Image-to-Scene Results                   | Text-to-Scene Results                      |
| :--------------------------------------: | :----------------------------------------: |
| ![Img2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/ksN5t8QEu3Iv6KhpsYsk6.png) | ![Text2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/waCRa3kp01KAsKgmqS1bb.png) |

<p>TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.</p>
</div>

## ✨ News

- [Sep, 2025] We released the paper of SpatialGen!
- [Aug, 2025] Initial release of SpatialGen-1.0!

## 📋 Release Plan

- [x] Provide inference code of SpatialGen.
- [ ] Provide training instruction for SpatialGen.
- [ ] Release SpatialGen dataset.

## SpatialGen Models

<div align="center">

| **Model**                 | **Download**                                                                         |
| :-----------------------: | -------------------------------------------------------------------------------------|
| SpatialGen-1.0            | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0)            |
| FLUX.1-Layout-ControlNet  | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet)  |
| FLUX.1-Wireframe-dev-lora | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Wireframe-dev-lora) |

</div>

## Usage

### 🔧 Installation

Tested with the following environment:
* Python 3.10
* PyTorch 2.3.1
* CUDA Version 12.1

```bash
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
```

### 📊 Dataset

We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

### Inference

```bash
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh

# Text-to-image-to-3D Scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room, 
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh
```

## License

[SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL). [FLUX.1-Layout-ControlNet](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) is licensed under the [FLUX.1-dev Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev).

## Acknowledgements

We would like to thank the following projects that made this work possible:

[DiffSplat](https://github.com/chenguolin/DiffSplat) | [SD 2.1](https://github.com/Stability-AI/stablediffusion) | [TAESD](https://github.com/madebyollin/taesd) | [FLUX](https://github.com/black-forest-labs/flux/) | [SpatialLM](https://github.com/manycore-research/SpatialLM)

## Citation

```bibtex
@article{SpatialGen,
  title         = {SpatialGen: Layout-guided 3D Indoor Scene Generation},
  author        = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping},
  journal       = {arXiv preprint},
  year          = {2025},
  eprint        = {2509.14981},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}
```