Research Highlight: Zhiyuan Li

Professor Zhiyuan Li joined TTIC’s faculty as an Assistant Professor in the fall of 2023. Professor Li’s main research interests are in the theoretical foundation of machine learning, especially deep learning theory. He is currently focusing on topics including non-convex optimization of neural networks, generalization of overparameterized models, implicit bias of optimization algorithms, and large language models.

This April, Professor Li and Professor Sanjeev Arora (Princeton University) were jointly named recipients of a Superalignment Fast Grant from Open AI to further investigate the “weak-to-strong generalization” problem. This process was highly selective, with only 50 of the 2,700 applicants receiving funding.

A fundamental challenge for aligning future superhuman AI systems (superalignment) lies in the fact that humans will need to supervise AI systems that surpass their own abilities. One of the most widely used alignment techniques is reinforcement learning from human feedback, which relies on humans to supervise models, such as evaluating whether a model followed instructions or generated the correct output, according to Professor Li. However, this method may become outdated and could perform poorly on superhuman models.

“Traditional machine learning focuses on the setting where humans supervise models that are weaker than humans,” Professor Li said. “However, AI systems moving forward will surpass the intelligence levels of the humans who are supervising it, meaning that humans have become ‘weak supervisors.’ The question we are trying to answer is how can humans steer and trust AI systems that are more intelligent than them?”

Superintelligence (AI that is vastly smarter than humans) could be developed within the next decade, and it is important to know how to steer and control superhuman AI systems. The future of AI systems will be capable of complex behaviors that will make it hard for humans to reliably supervise them.

“Our research demonstrates that we can use small models to supervise large models—when we supervise GPT-4 with a GPT-2-level model using this method on [natural language processing] (NLP) tasks, the resulting model typically performs somewhere between GPT-3 and GPT-3.5,” Professor Li said. “We are researching if small models who would serve as a ‘weak supervisor’ can supervise larger models. Will the strong model generalize according to the weak supervisor’s intent?”

Professor Li received his Ph.D. in computer science at Princeton University in 2022, and served as a postdoctoral fellow in the Computer Science Department at Stanford University from 2022 to 2023 before joining TTIC’s faculty. He has served as Area Chair for the Conference on Neural Information Processing Systems (NeurIPS) and is a recipient of a Microsoft Research Ph.D. Fellowship.