Simple Preference Optimization (SimPO) by Yu Meng, Mengzhou Xia, and Danqi Chen proposes a simpler and more effective preference optimization algorithm than DPO without using a reference model. The ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results