Răzvan PAȘCANU
United Kingdom
In this talk I will focus on State Space Models (SSM), a recently introduced family of sequential models and specifically discuss the relationship between SSMs and recurrent neural networks. I will start with a short history of architecture design for language modelling, which I will use as a motivating task. This will allow me to provide some insights in the evolution of RNN architectures, and why some choices behind the SSM architecture seemed counter-intuitive. Most of the talk will focus on introducing the Linear Recurrent Unit architecture, explaining the role of the various modifications from traditional non-linear recurrent models. If time allows I will present the relationship between linear recurrences and linear attention, showing that the update of the recurrence can be seen as a gradient step on a local objective that can be seen as trying to compress the sequence. This falls inline with new works like Gated Delta Networks or Mesa-Net. I hope in this talk to show a potential connection between these modern architectures and control.
Dr. Răzvan Pașcanu is a prominent researcher in machine learning and artificial intelligence, currently serving as a Research Scientist at DeepMind and an Affiliate Member at MILA. Born and raised in Romania, he pursued his undergraduate studies in computer science and electrical engineering in Germany. In 2009, he earned his M.Sc. from Jacobs University Bremen under the supervision of Prof. Herbert Jaeger. Dr. Pașcanu completed his Ph.D. at the Université de Montréal in 2014, where he was mentored by Prof. Yoshua Bengio. His research interests encompass deep learning, optimization, recurrent neural networks, and reinforcement learning. Beyond his role at DeepMind, he contributes to academia as a faculty member at the Faculty of Mathematics and Computer Science, Jagiellonian University in Kraków, Poland. Dr. Pașcanu has an extensive publication record, with 117 works cited over 23,000 times, reflecting his significant impact on the field.