Text-to-Image
Diffusers
Safetensors
xwwshen commited on
Commit
24802fb
·
1 Parent(s): 405ba40

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align=“center” style=“font-family: charter;”>
2
+ <h1 align="center">Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference </h1>
3
+ <div align="center">
4
+ <a href='https://arxiv.org/abs/2509.06942'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a> &nbsp;
5
+ <a href='https://huggingface.co/tencent/SRPO/'><img src='https://img.shields.io/badge/_Code-SRRPO-181717?color=121717&logo=github&logoColor=whitee'></a> &nbsp;
6
+ <a href=''><img src='https://img.shields.io/badge/%F0%9F%92%BB_Project-SRPO-blue'></a> &nbsp;
7
+ </div>
8
+ <div align="center">
9
+ Xiangwei Shen<sup>1,2*</sup>,
10
+ <a href="https://scholar.google.com/citations?user=Lnr1FQEAAAAJ&hl=zh-CN" target="_blank"><b>Zhimin Li</b></a><sup>1*</sup>,
11
+ <a href="https://scholar.google.com.hk/citations?user=Fz3X5FwAAAAJ" target="_blank"><b>Zhantao Yang</b></a><sup>1</sup>,
12
+ <a href="https://shiyi-zh0408.github.io/" target="_blank"><b>Shiyi Zhang</b></a><sup>3</sup>,
13
+ Yingfang Zhang<sup>1</sup>,
14
+ Donghao Li<sup>1</sup>,
15
+ <br>
16
+ <a href="https://scholar.google.com/citations?user=VXQV5xwAAAAJ&hl=en" target="_blank"><b>Chunyu Wang</b></a><sup>1</sup>,
17
+ <a href="https://openreview.net/profile?id=%7EQinglin_Lu2" target="_blank"><b>Qinglin Lu</b></a><sup>1</sup>,
18
+ <a href="https://andytang15.github.io" target="_blank"><b>Yansong Tang</b></a><sup>3,✝</sup>
19
+ </div>
20
+ <div align="center">
21
+ <sup>1</sup>Hunyuan, Tencent 
22
+ <br>
23
+ <sup>2</sup>School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 
24
+ <br>
25
+ <sup>3</sup>Shenzhen International Graduate School, Tsinghua University 
26
+ <br>
27
+ <sup>*</sup>Equal contribution 
28
+ <sup>✝</sup>Corresponding author
29
+ </div>
30
+
31
+
32
+ ## Abstrat
33
+ Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
34
+
35
+ ## Quick Started
36
+ ### Checkpoints
37
+ The `diffusion_pytorch_model.safetensors` is online version of SRPO based on [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), trained on HPD dataset with [HPSv2](https://github.com/tgxs002/HPSv2)
38
+
39
+ #### Inference
40
+ Replace the `diffusion_pytorch_model.safetensors` of FLUX
41
+ ```python
42
+ pipe = FluxPipeline.from_pretrained('your dir',
43
+ torch_dtype=torch.bfloat16,
44
+ use_safetensors=True
45
+ ).to("cuda")
46
+ state_dict = load_file("yourpath")
47
+ pipe.transformer.load_state_dict(state_dict)
48
+ image = pipe(
49
+ prompt,
50
+ guidance_scale=3.5,
51
+ height=1024,
52
+ width=1024,
53
+ num_inference_steps=infer_step,
54
+ max_sequence_length=512,
55
+ generator=generator
56
+ ).images[0]
57
+ ```
58
+ ### License
59
+ SRPO is licensed under the License Terms of SRPO. See `./License.txt` for more details.
60
+ ## Citation
61
+ If you use SRPO for your research, please cite our paper:
62
+
63
+ ```bibtex
64
+ @misc{shen2025directlyaligningdiffusiontrajectory,
65
+ title={Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference},
66
+ author={Xiangwei Shen and Zhimin Li and Zhantao Yang and Shiyi Zhang and Yingfang Zhang and Donghao Li and Chunyu Wang and Qinglin Lu and Yansong Tang},
67
+ year={2025},
68
+ eprint={2509.06942},
69
+ archivePrefix={arXiv},
70
+ primaryClass={cs.AI},
71
+ url={https://arxiv.org/abs/2509.06942},
72
+ }
73
+ ```