---
pipeline_tag: text-to-image
inference: false
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
- tensorrt
- sd3.5-large
- text-to-image
- depth
- canny
- blur
- controlnet
- onnx
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
  and acknowledge Stability AI's [Privacy
  Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'
  What do you intend to use the model for?:
    type: select
    options:
      - Research
      - Personal use
      - Creative Professional
      - Startup
      - Enterprise
  I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
language:
- en
---

# Stable Diffusion 3.5 Large ControlNet TensorRT
## Introduction

This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large ControlNets**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.

The following control types are available:

- Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles.

- Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets.

- Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image.

## Model Details

### Model Description
This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision.


## Performance using TensorRT 10.13
#### Depth ControlNet: Timings for 40 steps at 1024x1024


| Accelerator | Precision | VAE Encoder | CLIP-G     | CLIP-L       | T5           | MMDiT x 40            | VAE Decoder         | Total                  |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100        | BF16      | 74.97 ms    | 11.87 ms   | 4.90 ms      | 8.82 ms      | 18839.01 ms           | 117.38 ms           | 19097.19 ms            |

#### Canny ControlNet: Timings for 60 steps at 1024x1024


| Accelerator | Precision | VAE Encoder | CLIP-G     | CLIP-L       | T5           | MMDiT x 60            | VAE Decoder         | Total                  |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100        | BF16      | 78.50 ms    | 12.29 ms   | 5.08 ms      | 8.65 ms      | 28057.08 ms           | 106.49 ms           | 28306.20 ms            |


#### Blur ControlNet: Timings for 60 steps at 1024x1024

| Accelerator | Precision | VAE Encoder | CLIP-G     | CLIP-L       | T5           | MMDiT x 60            | VAE Decoder         | Total                  |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100        | BF16      | 74.48 ms    | 11.71 ms   | 4.86 ms      | 8.80 ms      | 28604.26 ms           | 113.24 ms           | 28859.06 ms            |


## Usage Example
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
```


2. Install libraries and requirements
```shell
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
```

3. Generate HuggingFace user access token
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages. 
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).

```bash
export HF_TOKEN=<your access token>
```

4. Perform TensorRT optimized inference:

  - **Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision**
        
    ```
    python3 demo_controlnet_sd35.py \
      "a photo of a man" \
      --version=3.5-large \
      --bf16 \
      --controlnet-type depth \
      --download-onnx-models \
      --denoising-steps=40 \
      --guidance-scale 4.5 \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN
    ```

  - **Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision**
        
    ```
    python3 demo_controlnet_sd35.py \
      "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \
      --version=3.5-large \
      --bf16 \
      --controlnet-type canny \
      --download-onnx-models \
      --denoising-steps=60 \
      --guidance-scale 3.5 \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN
    ```


  - **Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision**
        
    ```
    python3 demo_controlnet_sd35.py \
      "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \
      --version=3.5-large \
      --bf16 \
      --controlnet-type blur \
      --download-onnx-models \
      --denoising-steps=60 \
      --guidance-scale 3.5 \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN
    ```