|
--- |
|
base_model: |
|
- black-forest-labs/FLUX.1-dev |
|
language: |
|
- en |
|
library_name: diffusers |
|
license: apache-2.0 |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-generation |
|
- subject-personalization |
|
- style-transfer |
|
- Diffusion-Transformer |
|
--- |
|
|
|
<p align="center"> |
|
<img src="assets/uso.webp" width="100"/> |
|
<p> |
|
<h3 align="center"> |
|
Unified Style and Subject-Driven Generation via Disentangled and Reward Learning |
|
</h3> |
|
|
|
Paper: [USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning](https://huggingface.co/papers/2508.18966) |
|
|
|
<p align="center"> |
|
<a href="https://github.com/bytedance/USO"><img alt="Build" src="https://img.shields.io/github/stars/bytedance/USO"></a> |
|
<a href="https://bytedance.github.io/USO/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-USO-blue"></a> |
|
<a href="https://arxiv.org/abs/2508.18966"><img alt="Build" src="https://img.shields.io/badge/Tech%20Report-USO-b31b1b.svg"></a> |
|
<a href="https://huggingface.co/bytedance-research/USO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a> |
|
</p> |
|
|
|
 |
|
|
|
## Abstract |
|
Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of content and style, a long-standing theme in style-driven research. To this end, we present USO, a Unified Style-Subject Optimized customization model. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content-style disentanglement training. Third, we incorporate a style reward-learning paradigm denoted as SRL to further enhance the model's performance. Finally, we release USO-Bench, the first benchmark that jointly evaluates style similarity and subject fidelity across multiple metrics. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity. Code and model: this https URL |
|
|
|
## ⚡️ Quick Start |
|
|
|
### 🔧 Requirements and Installation |
|
|
|
Install the requirements |
|
```bash |
|
## create a virtual environment with python >= 3.10 <= 3.12, like |
|
python -m venv uso_env |
|
source uso_env/bin/activate |
|
## or |
|
conda create -n uso_env python=3.10 -y |
|
conda activate uso_env |
|
## then install the requirements by you need |
|
pip install -r requirements.txt # legacy installation command |
|
``` |
|
|
|
Then download checkpoints in one of the following ways: |
|
- **Suppose you already have some of the checkpoints** |
|
```bash |
|
# 1. download USO official checkpoints |
|
pip install huggingface_hub |
|
huggingface-cli download bytedance-research/USO --local-dir <YOU_SAVE_DIR> --local-dir-use-symlinks False |
|
|
|
# 2. Then set the environment variable for FLUX.1 base model |
|
export AE="YOUR_AE_PATH" |
|
export FLUX_DEV="YOUR_FLUX_DEV_PATH" |
|
export T5="YOUR_T5_PATH" |
|
export CLIP="YOUR_CLIP_PATH" |
|
# or export HF_HOME="YOUR_HF_HOME" |
|
|
|
# 3. Then set the environment variable for USO |
|
export LORA="<YOU_SAVE_DIR>/uso_flux_v1.0/dit_lora.safetensors" |
|
export PROJECTION_MODEL="<YOU_SAVE_DIR>/uso_flux_v1.0/projector.safetensors" |
|
``` |
|
- Directly run the inference scripts, the checkpoints will be downloaded automatically by the `hf_hub_download` function in the code. |
|
|
|
### ✍️ Inference |
|
Start from the examples below to explore and spark your creativity. ✨ |
|
```bash |
|
# the first image is a content reference, and the rest are style references. |
|
|
|
# for subject-driven generation |
|
python inference.py --prompt "The man in flower shops carefully match bouquets, conveying beautiful emotions and blessings with flowers. " --image_paths "assets/gradio_examples/identity1.jpg" --width 1024 --height 1024 |
|
# for style-driven generation |
|
# please keep the first image path empty |
|
python inference.py --prompt "A cat sleeping on a chair." --image_paths "" "assets/gradio_examples/style1.webp" --width 1024 --height 1024 |
|
# for ip-style generation |
|
python inference.py --prompt "The woman gave an impassioned speech on the podium." --image_paths "assets/gradio_examples/identity2.webp" "assets/gradio_examples/style2.webp" --width 1024 --height 1024 |
|
# for multi-style generation |
|
# please keep the first image path empty |
|
python inference.py --prompt "A handsome man." --image_paths "" "assets/gradio_examples/style3.webp" "assets/gradio_examples/style4.webp" --width 1024 --height 1024 |
|
``` |
|
|
|
## 📄 Disclaimer |
|
<p> |
|
We open-source this project for academic research. The vast majority of images |
|
used in this project are either generated or from open-source datasets. If you have any concerns, |
|
please contact us, and we will promptly remove any inappropriate content. |
|
Our project is released under the Apache 2.0 License. If you apply to other base models, |
|
please ensure that you comply with the original licensing terms. |
|
<br><br>This research aims to advance the field of generative AI. Users are free to |
|
create images using this tool, provided they comply with local laws and exercise |
|
responsible usage. The developers are not liable for any misuse of the tool by users.</p> |
|
|
|
## Citation |
|
We also appreciate it if you could give a star ⭐ to our [Github repository](https://github.com/bytedance/USO). Thanks a lot! |
|
|
|
If you find this project useful for your research, please consider citing our paper: |
|
```bibtex |
|
@article{wu2025uso, |
|
title={USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning}, |
|
author={Shaojin Wu and Mengqi Huang and Yufeng Cheng and Wenxu Wu and Jiahe Tian and Yiming Luo and Fei Ding and Qian He}, |
|
year={2025}, |
|
eprint={2508.18966}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
} |
|
``` |