Paulo Rauber

About me

I am a lecturer in Artificial Intelligence at Queen Mary University of London. Before becoming a lecturer, I was a postdoctoral researcher in the Swiss AI lab working on reinforcement learning under the supervision of Jürgen Schmidhuber.

I believe that intelligence should be defined as a measure of the ability of an agent to achieve goals in a wide range of environments (Legg and Hutter, 2007), which makes reinforcement learning an excellent framework to study many challenges that intelligent agents are bound to face.

I am currently interested in unlocking the potential of formalization to accelerate the development of machine learning theory.

Formalization is the process of translating mathematical statements and their proofs into a formal language that enables their correctness to be verified algorithmically. The mathematical community has largely adopted the open-source programming language Lean for formalization, whose mathematical library has more than a million lines of code. Several organizations are also developing reliable problem solvers that combine general-purpose large language models with Lean. As these systems improve and become more widely available, they may support the development of provably safe artificial intelligence.

PhD students

Michelangelo Conserva (now at Google Research).
Remo Sasso (now at Amazon).

If you would like to work under my supervision, please send me a message with your curriculum vitae and a brief description of your goals after reading this.

Selected papers

R. Sasso, M. Conserva, D. Jeurissen, P. Rauber. "Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches", 2025.
M. Conserva, R. Sasso, P. Rauber. "On the Limits of Tabular Hardness Metrics for Deep RL: A Study with the Pharos Benchmark", 2025.
R. Sasso, M. Conserva, D. Jeurissen, P. Rauber. "Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds", 2025.
R. Sasso, M. Conserva, P. Rauber. "Posterior Sampling for Deep Reinforcement Learning", International Conference on Machine Learning (ICML), 2023.
M. Conserva, P. Rauber. "Hardness in Markov Decision Processes: Theory and Practice", Conference on Neural Information Processing Systems (NeurIPS), 2022.
A. Ramesh*, P. Rauber*, M. Conserva, J. Schmidhuber. "Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits", Neural Computation, 2022.
P. Rauber, A. Ummadisingu, F. Mutz, J. Schmidhuber. "Hindsight Policy Gradients", International Conference on Learning Representations (ICLR), 2019.

More work is available here.