Paper – Zephyr

Zephyr is 7B LLM that utilizes distilled Direct Preference Optimization (dDPO) that significantly improves intent alignment and AI Feedback (AIF) preference data to achieve superior intent alignment in chat-based language modeling without requiring human annotation. Method The approach follows similar stages as InstructGPT. Distilled Supervised Fine-Tuning (dSFT) Starting with a raw LLM, it first needs … Continue reading Paper – Zephyr