FineTune Stable Diffusion with DDPO via TRL
AI Impact Summary
This documentation details the use of DDPO, a technique for fine-tuning Stable Diffusion models using Reinforcement Learning from Human Feedback (RLHF). Specifically, it outlines a workflow leveraging Denoising Diffusion Policy Optimization (DDPO) within the trl library to align model outputs with human aesthetic preferences, using a reward model trained on the AVA dataset. This approach offers a computationally efficient alternative to RWR, addressing limitations related to approximation errors and complex objective functions.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info