Sophia: A Scalable Second-Order Optimizer for Language Model Pre-Training

(arxiv.org)

3 points | by Anon84 11 hours ago ago

No comments yet.