--- pipeline_tag: text-to-image inference: false license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - tensorrt - sd3.5-large - text-to-image - depth - canny - blur - controlnet - onnx extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and promotions on Stability AI products, services, and research?: type: select options: - 'Yes' - 'No' What do you intend to use the model for?: type: select options: - Research - Personal use - Creative Professional - Startup - Enterprise I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox language: - en --- # Stable Diffusion 3.5 Large ControlNet TensorRT ## Introduction This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large ControlNets**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. The following control types are available: - Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles. - Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets. - Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image. ## Model Details ### Model Description This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision. ## Performance using TensorRT 10.13 #### Depth ControlNet: Timings for 40 steps at 1024x1024 | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 40 | VAE Decoder | Total | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| | H100 | BF16 | 74.97 ms | 11.87 ms | 4.90 ms | 8.82 ms | 18839.01 ms | 117.38 ms | 19097.19 ms | #### Canny ControlNet: Timings for 60 steps at 1024x1024 | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| | H100 | BF16 | 78.50 ms | 12.29 ms | 5.08 ms | 8.65 ms | 28057.08 ms | 106.49 ms | 28306.20 ms | #### Blur ControlNet: Timings for 60 steps at 1024x1024 | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| | H100 | BF16 | 74.48 ms | 11.71 ms | 4.86 ms | 8.80 ms | 28604.26 ms | 113.24 ms | 28859.06 ms | ## Usage Example 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. ```shell git clone https://github.com/NVIDIA/TensorRT.git cd TensorRT git checkout release/sd35 docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash ``` 2. Install libraries and requirements ```shell cd demo/Diffusion python3 -m pip install --upgrade pip pip3 install -r requirements.txt python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 ``` 3. Generate HuggingFace user access token To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages. You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). ```bash export HF_TOKEN= ``` 4. Perform TensorRT optimized inference: - **Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision** ``` python3 demo_controlnet_sd35.py \ "a photo of a man" \ --version=3.5-large \ --bf16 \ --controlnet-type depth \ --download-onnx-models \ --denoising-steps=40 \ --guidance-scale 4.5 \ --build-static-batch \ --use-cuda-graph \ --hf-token=$HF_TOKEN ``` - **Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision** ``` python3 demo_controlnet_sd35.py \ "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \ --version=3.5-large \ --bf16 \ --controlnet-type canny \ --download-onnx-models \ --denoising-steps=60 \ --guidance-scale 3.5 \ --build-static-batch \ --use-cuda-graph \ --hf-token=$HF_TOKEN ``` - **Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision** ``` python3 demo_controlnet_sd35.py \ "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \ --version=3.5-large \ --bf16 \ --controlnet-type blur \ --download-onnx-models \ --denoising-steps=60 \ --guidance-scale 3.5 \ --build-static-batch \ --use-cuda-graph \ --hf-token=$HF_TOKEN ```