Generalized On-Policy Distillation with Reward Extrapolation

(arxiv.org)

2 points | by fzliu 10 hours ago ago

No comments yet.