Colloquium Cognitive Systems

Meta-learning of sequential strategies

PhD Pedro Ortega (Google DeepMind)

 

Abstract. Can a slow, incremental learning rule give rise to a sophisticated reinforcement learning algorithm? I will show how a deep learning system with memory, when trained on a distribution of tasks, learns a new learning algorithm which turns out to be Bayes-optimal. In particular, we will see how these algorithms are implemented in the memory, and relate it to Bayesian reinforcement learning.

Bio. Pedro A. Ortega is a Research Scientist at DeepMind. His work includes the application of information-theoretic and statistical mechanical ideas to sequential decision-making, which has led to contributions in novel bounded rationality models and recasting adaptive control as a causal inference problem. He obtained his PhD in Engineering from the University of Cambridge (Prof. Zoubin Ghahramani), and he has been a post-doctoral fellow at the Department of Engineering in Cambridge (Prof. Simon Godsill), at the Max Planck Institute for Biological Cybernetics/Intelligent Systems (Daniel A. Braun), at the Hebrew University in Jerusalem (Prof. Naftali Tishby), and at the University of Pennsylvania (Prof. Daniel D. Lee and Prof. Alan A. Stocker).