InfoCapability

FineTune Stable Diffusion with DDPO via TRL

AI Impact Summary

This documentation details the use of DDPO, a technique for fine-tuning Stable Diffusion models using Reinforcement Learning from Human Feedback (RLHF). Specifically, it outlines a workflow leveraging Denoising Diffusion Policy Optimization (DDPO) within the trl library to align model outputs with human aesthetic preferences, using a reward model trained on the AVA dataset. This approach offers a computationally efficient alternative to RWR, addressing limitations related to approximation errors and complex objective functions.

Affected Systems

Stable DiffusionDDPO

Date: Date not specified
Change type: capability
Severity: info

FineTune Stable Diffusion with DDPO via TRL

More from Hugging Face

Get alerts for Hugging Face