← home
alignment & safety
🎯
RLHF
Reward modeling, PPO, KL penalty, and collecting human preference data at scale.