Image-to-3D
Diffusers
Safetensors
SpatialGenDiffusionPipeline
SpatialGen-1.0 / README.md
bertjiazheng's picture
Update pipeline tag, add library name, and expand content (#1)
3c6ff63 verified
|
raw
history blame
5.77 kB
metadata
base_model:
  - stabilityai/stable-diffusion-2-1
datasets:
  - manycore-research/SpatialGen-Testset
license: creativeml-openrail-m
pipeline_tag: image-to-3d
library_name: diffusers

SpatialGen: Layout-guided 3D Indoor Scene Generation

SpatialLM

Project arXiv GitHub Hugging Face
Image-to-Scene Results Text-to-Scene Results
Img2Scene Text2Scene

TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.

✨ News

  • [Aug, 2025] Initial release of SpatialGen-1.0!
  • [Sep, 2025] We release the paper of SpatialGen!

📋 Release Plan

  • Provide inference code of SpatialGen.
  • Provide training instruction for SpatialGen.
  • Release SpatialGen dataset.

SpatialGen Models

Model Download
SpatialGen-1.0 🤗 HuggingFace
FLUX.1-Layout-ControlNet 🤗 HuggingFace

Usage

🔧 Installation

Tested with the following environment:

  • Python 3.10
  • PyTorch 2.3.1
  • CUDA Version 12.1
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8

📊 Dataset

We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

Inference

# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh

# Text-to-image-to-3D Scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room, 
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh

License

SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License. FLUX.1-Layout-ControlNet is licensed under the FLUX.1-dev Non-Commercial License.

Acknowledgements

We would like to thank the following projects that made this work possible:

DiffSplat | SD 2.1 | TAESD | FLUX | SpatialLM

Citation

@article{wu2024spatialgen,
  title={SPATIALGEN: Layout-guided 3D Indoor Scene Generation},
  author={Zhenqing Wu and Zhenxiong Tan and Guolin Chen and Wenbo Zhao and Xingyi Yang and Xiaofeng Wang and Jianmin Li and Bo Dai and Dahua Lin and Xinchao Wang},
  journal={arXiv preprint arXiv:2509.14981},
  year={2025}
}