Deprecated : The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Alignment - a Ambroser53 Collection
Ambroser53 's Collections Alignment updated Oct 23, 2024
Understanding the performance gap between online and offline alignment
algorithms Paper
• 2405.08448
• Published May 14, 2024 • 18
Self-Exploring Language Models: Active Preference Elicitation for Online
Alignment Paper
• 2405.19332
• Published May 29, 2024 • 22
Offline Regularised Reinforcement Learning for Large Language Models
Alignment Paper
• 2405.19107
• Published May 29, 2024 • 15
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper
• 2406.00888
• Published Jun 2, 2024 • 33
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms Paper
• 2406.02900
• Published Jun 5, 2024 • 13
BPO: Supercharging Online Preference Learning by Adhering to the
Proximity of Behavior LLM Paper
• 2406.12168
• Published Jun 18, 2024 • 7
Deep Bayesian Active Learning for Preference Modeling in Large Language
Models Paper
• 2406.10023
• Published Jun 14, 2024 • 2
Bootstrapping Language Models with DPO Implicit Rewards Paper
• 2406.09760
• Published Jun 14, 2024 • 41
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix"
Cycle Paper
• 2407.13833
• Published Jul 18, 2024 • 12
Baichuan Alignment Technical Report Paper
• 2410.14940
• Published Oct 19, 2024 • 51