← homealignment & safety

RLHF

Reward modeling, PPO, KL penalty, and collecting human preference data at scale.