--- library_name: HunyuanImage-2.1 license: other license_name: tencent-hunyuan-community license_link: https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/blob/master/LICENSE language: - en - zh tags: - text-to-image - comfyui - diffusers pipeline_tag: text-to-image extra_gated_eu_disallowed: true ---

HunyuanImage-2.1 fp8 e4m3fn

An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation

--- ## **Performance on RTX 5090** > When using **HunyuanImage-2.1** with the **quantized encoder** + **quantized base model**, > the VRAM usage on an **NVIDIA RTX 5090** typically ranges between **26 GB and 30 GB** with average > 16 second inference time depending on resolution, batch size, and prompt complexity. > **Reports that it works on 16gb VRAM GPU's** ⚠ **Important Note:** The **refiner** is still not implemented and is **not ready for use in ComfyUI**. However, the **distilled model now works in ComfyUI** with recommended settings of **8 steps / 1.5-2.5 CFG**. ---

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/drMNYMjvB01RvgZKS6kX6.jpeg) ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/uxhsoLKjzJu24eCZh_RQ8.jpeg) --- ## **Download Quantized Model (FP8 e4m3fn)** [**Download hunyuanimage2.1_fp8_e4m3fn.safetensors**](https://huggingface.co/drbaph/HunyuanImage-2.1_fp8/blob/main/hunyuanimage2.1_fp8_e4m3fn.safetensors) --- ### **Workflow Notes** - **Model:** HunyuanImage-2.1 - **Mode:** Quantized Encoder + Quantized Base Model - **VRAM Usage:** ~26GB–30GB on RTX 5090 - **Resolution Tested:** 2K (2048×2048) - **Frameworks:** ComfyUI & Diffusers - **Optimisations** Works with Patch Sage Attention + Lazycache / TeaCache ✅ - **Distilled Model:** ✅ Now works in ComfyUI with **8 steps / 1.5-2.5 CFG** - **Refiner:** ❌ Still not implemented, **not available in ComfyUI** - **License:** [tencent-hunyuan-community](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/blob/master/LICENSE) ---

🚀 **Optimized for High-Resolution, Memory-Efficient Text-to-Image Generation**