Together AI introduces DPO fine-tuning for language model alignment
Action Required
Developers can now more effectively align language models with human preferences, leading to improved AI assistant performance and user satisfaction.
AI Impact Summary
Together AI has introduced Direct Preference Optimization (DPO) fine-tuning for its Fine-Tuning Platform, allowing developers to align language models with human preferences. This capability enables the creation of more helpful and tailored AI assistants by directly training models on preference data – pairs of preferred and non-preferred responses. This represents a significant shift from traditional RLHF methods, offering a simpler, more efficient, and potentially more effective approach to model alignment.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium