Cornell University

2 West Loop Road, New York, NY 10044

https://lmss.tech.cornell.edu/ #CornellTech
View map Free Event

Learning Machines Seminar Series

What: LMSS: Jonathan Berant (Tel Aviv University)
When: Thursday, Nov 21, 12:00 p.m. to 1:00 p.m.
Where: Bloomberg 81, Bloomberg Center, Cornell Tech (map)

The series is organized by Associate Professor Yoav Artzi and sponsored by Bloomberg.


Pizza will be served at 11:40 a.m.

"Towards Robust Language Model Post-training"

Post-training (or alignment) is the process of training an already pretrained language model  (LM) to align to human preferences. Post-training has attracted enormous research attention recently, but there is little agreement on how to robustly post-train LMs. In this talk, I will cover two recent lines of work that aim to make LM post-training more robust. (a) In online alignment, where we sample from the LM during training and maximize KL-penzlied expected reward, the biggest issue is reward hacking, where the LM can maximize the proxy reward model while generating bad outputs. I will describe recent work on using reward ensembles for estimating reward uncertainty, and show that leveraging this information can mitigate but does not eliminate reward hacking. (b) In offline alignment, where the LM is trained from off-policy samples, common techniques often degenerate mostly due to inherently limited human-annotated preference data. We propose an approach that combines reward model distillation with pessimism over an ensemble of reward models to circumvent this degeneration. Time permitting, I will briefly touch on post-training LMs with transformed reward models such that they are more suitable when best-of-N or worst-of-N are used at test time. 

BIO

Jonathan Berant is an associate professor at the School of Computer Science at Tel Aviv University, currently also a visiting faculty researcher at Google DeepMind. Jonathan earned a Ph.D. in Computer Science at Tel-Aviv University,  and was a post-doctoral fellow at Stanford University, and subsequently a post-doctoral fellow at Google Research, Mountain View. Jonathan has worked in many applied areas of natural language understanding, including semantic parsing, question answering, and recently focused on post-training. Jonathan Received several awards and fellowships including The Rothschild fellowship, The ACL 2011 best student paper award, EMNLP 2014 best paper award, NAACL 2019 best resource paper award, and ICLR 2024 best paper award. Jonathan has won the Kadar prize for outstanding research and is currently an ERC grantee.

0 people are interested in this event

User Activity

No recent activity