Predicting When RL Training Breaks Chain-of-Thought Monitorability

(lesswrong.com)

1 points | by gmays 9 hours ago ago

No comments yet.