David Krueger

I am an Assistant Professor at the University of Montreal and a member of Mila. My research group KASL focuses on Deep Learning, AI Alignment, AI safety, and AI policy. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

Reward modeling and reward hacking
Aligning large language models (LLMs) and foundation models
Scientifically understanding how learning and generalization work in deep learning
Preventing and mitigating socially harmful development, deployment, and use of AI systems
Elaborating and evaluating speculative concerns about more advanced future AI systems

I was previously an Assistant Professor at the University of Cambridge and a member of the Computational and Biological Learning lab (CBL) and Machine Learning Group (MLG).

News:

Alan Chan has successfully defended his PhD thesis and will be joining GovAI.
I will be co-supervising Jan Wehner's PhD at CISPA.
We have 4 accepted papers at NeurIPS 2024.
Our paper “Thinker: Learning to Plan and Act” was accepted at NeurIPS 2023.
Our paper “Characterizing Manipulation from AI Systems” was accepted at EEAMO 2023.
Our paper “Harms from Increasingly Agentic Algorithmic Systems” was accepted at FAccT 2023.
Two papers accepted at ICLR 2023: “Broken Neural Scaling Laws” and "Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics (Spotlight!).
Our paper “Defining and Characterizing Reward Hacking” was accepted at NeurIPS 2022.
Our paper “Objective Robustness in Deep Reinforcement Learning” was accepted at ICML 2022.
Usman Anwar, Stephen Chung, and Bruno Mlodozeniec have joined the group as PhD students.
We’ve received a $1m grant from the Open Philanthropy Project to study Reward Model Hacking.

PhD Students:

Neel Alex

alexneel@gmail.com I’m a second year PhD student supervised by David Krueger. Previously, I was a research intern at UC Berkeley’s Center for Human-Compatible AI for two years, and I’ve also done internships at small companies such as Ought and DeepScale. I’m broadly interested in AI alignment and the long-term future of AI. My work so far has largely been in benchmarking and problem definition – how do we specify problems in a way to get AIs to do useful things for us? That work has spanned several domains, from traditional RL sequential environments to large language models. Recently, I’ve been working more in sequential learning environments, and trying to understand some ways that learning might work without explicitly given reward functions. Apart from technical AI, I’m also personally interested (though I lack expertise) in AI governance and solving global coordination problems.

Usman Anwar

usmananwar391@gmail.com I am a PhD student at the University of Cambridge. I am supervised by David Kruger and funded by Open Phil AI Fellowship and Vitalik Buterin Fellowship on Existential AI Safety. My research interests span Reinforcement Learning, Deep Learning and Cooperative AI. My long term goal in AI research is to develop useful, versatile and human-aligned AI systems that can learn from humans and each other. My research focuses on identifying the factors which make it difficult to develop human-aligned AI systems and developing techniques to work around these factors. In particular, I am interested in exploring ways through which rich human preferences and desires could be adaptively communicated to the AI agents, especially in complex scenarios such as multi-agent planning and time-varying preferences with the ultimate goal of both broadening the scope of tasks that AI agents can undertake as well as making the AI agents more aligned and trustworthy. For publications and other details, please visit https://uzman-anwar.github.io.

Ethan Caballero

I'm interested in finding all the downstream evaluations that matter and finding that which scales optimally with respect to all those downstream evaluations.

Dmitrii Krasheninnikov

dmkr0001@gmail.com I am a PhD student working on AI safety with David Krueger. I’m interested in designing AI systems that do what we want them to do, and am hoping to ensure that AI’s long-term impact on humanity is positive. I earned my master’s degree in AI from the University of Amsterdam, and had the opportunity to work with UC Berkeley’s Center for Human-Compatible AI during and after my studies. I also spent about a year working on deep reinforcement learning and robotics at Sony AI Zurich.

Lauro Langosco

langosco.lauro@gmail.com I’m a PhD student with David Krueger in CBL at Cambridge. My main research interest is AI alignment: the problem of building generally intelligent systems that do what their operator wants them to do. I’m also very interested in the science / theory of deep learning, i.e. understanding why and how DL systems generalize and scale. Previously, I interned at the Center for Human-Compatible AI in Berkeley and studied mathematics at ETH Zurich. I also helped run Effective Altruism Zurich for four years and co-organized a reading group on AI alignment.

Nitarshan Rajkumar

I'm a PhD student co-advised by Ferenc Huszár and David Krueger. I completed my MSc at Mila (Montreal) where I variously worked on empirical evaluation of generalization bounds, data-efficient RL using SSL, and Text-to-SQL using GPT-3 and Codex. I am broadly interested in deep learning at scale - understanding how and why performance improves (or doesn't). This interest extends to practical applications of large models (such as GPT-3 and Codex) on real-world tasks. I'm also interested in policy considerations for the responsible development of AI.

Shoaib Ahmed Siddiqui

msas3@cam.ac.uk I am a second-year Ph.D. student at the University of Cambridge supervised by David Krueger. Prior to this, I did my MS from Germany (TU Kaiserslautern) followed by an internship at NVIDIA research. I am broadly interested in the empirical theory of deep learning with an aim to better understand how deep learning models work. Understanding them will enable us to design more effective learning systems in the future. In regards to AI alignment, I work on the myopic problem of robustness, which includes both robustness against adversarial as well as common real-world corruptions. I am also looking at robustness against group imbalance in the context of model fairness (unwarranted penalization of minority group samples) or even label noise. I am also very interested in self-supervised learning (SSL) as it enables us to encode prior knowledge about the world. I consider SSL to be a natural direction for developing robust models in the future. Finally, I have some prior experience in large-scale deep learning (including both visual recognition and language models) and model compression.

Stephen Chung

Stephen is a Ph.D. student at the University of Cambridge supervised by David Krueger.

Bruno Mlodozeniec

Bruno is a Ph.D. student at the University of Cambridge co-supervised by David Krueger and Rich Turner.

Aryeh Englander

Aryeh is a Ph.D. student at University of Maryland Baltimore County co-supervised by I-Jeng Wang and David Krueger.

I have ongoing collaborations with the following students (among others) as well, who may be involved in supervising internship projects:

Ekdeep Singh Lubana (U Mich)

We are working on a mechanistic understanding of learning and generalization in deep learning, as in our paper "Mechanistic Mode Connetivity".

Alan Chan (Mila)

We are working on ellaborating socio-technical issues relevant to AI x-safety and connecting them with existing work in the FATE (fairness, accountability, transparency, and ethics) ML community, as in our paper "Harms from Increasingly Agentic Algorithmic Systems".

Joar Skalse (Oxford)

We are working on reward theory, as in our paper "Defining and Characterizing Reward Hacking".

Micah Carroll (Berkeley)

We are working on addressing AI manipulation, as in our work "Characterizing Manipulation from AI Systems".

Here is a recent example of a talk I've given at Edinburgh. More examples can be found in my CV:

“Can we get Deep Learning systems to generalize safely?”