On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, machine learning is a promising framework for automatically learning to solve problems. While machine learning has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints.
In this talk, I will discuss two approaches toward data-efficient robot learning: model-based reinforcement learning and Bayesian optimization.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Roberto Calandra, Jan Peters, André Seyfarth, Marc P. Deisenroth. An Experimental Evaluation of Bayesian Optimization on Bipedal Locomotion. Proceedings of the IEEE International Conference on Robotics and Automation, 2014.

In robot learning we face challenge of data-efficient learning. In this talk, we will make the case for three types of useful models that become handy in robot learning: probabilistic models, hierarchical models, and models that allow us to incorporate the underlying physics. We will briefly outline strong use cases for these three models in the context of model-based reinforcement learning, meta learning, and system identification.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018 Steindór Sæmundsson, Alexander Terenin, Katja Hofmann, Marc P. Deisenroth, Variational Integrator Networks for Physically Structured Embeddings, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018

In robot learning we face challenge of data-efficient learning. In this talk, we will make the case for three types of useful models that become handy in robot learning: probabilistic models, hierarchical models, and models that allow us to incorporate the underlying physics. We will briefly outline strong use cases for these three models in the context of model-based reinforcement learning, meta learning, and system identification.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018 Steindór Sæmundsson, Alexander Terenin, Katja Hofmann, Marc P. Deisenroth, Variational Integrator Networks for Physically Structured Embeddings, arXiv:1910.09349

Optimal control has seen many success stories over the past decades. However, when it comes to autonomous systems in open-ended settings, we require methods that allow for automatic learning from data. Reinforcement learning is a principled mathematical framework for autonomous learning of good control strategies from trial and error. Unfortunately, reinforcement learning suffers from data inefficieny, i.e., the learning system often requires collecting much data before learning anything useful. This extensive data collection is usually not practical when working with mechanical systems, such as robots.
In this talk, I will outline two approaches toward data-efficient reinforcement learning, and I will draw connections to the optimal control setting. First, I will detail a model-based reinforcement learning method, which exploits probabilistic models for fast learning. Second, I will discuss a model-predictive control approach with learned models, which allows us to provide some theoretical guarantees.
Finally, I will discuss some ideas that allow us to learn good predictive machine learning models that obey the laws of physics. This geometric approach finds physically meaningful representations of high-dimensional time-series data. With this, we can learn long-term predictive models from a few tens of image observations.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Alexander Terenin, Katja Hofmann, Marc P. Deisenroth, Variational Integrator Networks for Physically Meaningful Embeddings, arXiv:1910.09349

In many practical applications of machine learning, we face the challenge of data-efficient learning, i.e., learning from scarce data. This includes healthcare, climate science, and autonomous robots. There are many approaches toward learning from scarce data. In this talk, I will discuss a few of them in the context of reinforcement learning. First, I will motivate probabilistic, model-based approaches to reinforcement learning, which allow us to reduce the effect of model errors. Second, I will discuss a meta-learning approach that allows us to generalize knowledge across tasks to enable few-shot learning. Finally, we can also incorporate structural prior knowledge to speed up learning. In this final case, we can exploit Lie group structures to learn predictive models from high-dimensional observations with nearly no data.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018 Steindór Sæmundsson, Alexander Terenin, Katja Hofmann, Marc P. Deisenroth, Variational Integrator Networks for Physically Meaningful Embeddings, arXiv:1910.09349

In many high-impact areas of machine learning, we face the challenge of data-efficient learning, i.e., learning from scarce data. This includes healthcare, climate science, and autonomous robots. There are many approaches toward learning from scarce data. In this talk, I will discuss a few of them in the context of reinforcement learning. First, I will motivate probabilistic, model-based approaches to reinforcement learning, which allow us to reduce the effect of model errors. Second, I will discuss a meta-learning approach that allows us to generalize knowledge across tasks to enable few-shot learning.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence, 2018

Optimal control has seen many success stories over the past decades. However, when it comes to autonomous systems in open-ended settings, we require methods that allow for automatic learning from data. Reinforcement learning is a principled mathematical framework for autonomous learning of good control strategies from trial and error. Unfortunately, reinforcement learning suffers from data inefficieny, i.e., the learning system often requires collecting much data before learning anything useful. This extensive data collection is usually not practical when working with mechanical systems, such as robots.
In this talk, I will outline two approaches toward data-efficient reinforcement learning, and I will draw connections to the optimal control setting. First, I will detail a model-based reinforcement learning method, which exploits probabilistic models for fast learning. Second, I will discuss a model-predictive control approach with learned models, which allows us to provide some theoretical guarantees.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018

I will be talking about different aspects of uncertainty when doing reinforcement learning, e.g., model uncertainty, how to use it for exploration, some traps, how to use uncertainty for safe exploration etc.

The vision of intelligent and fully autonomous robots, which are part of our daily lives and automatically learn from mistakes and adapt to new situations, has been around for many decades. However, this vision has been elusive so far. Although reinforcement learning is a principled framework for learning from trial and error and has led to success stories in the context of games, we need to address a practical challenge when it comes to learning with mechanical systems: data efficiency, i.e., the ability to learn from scarce data in complex domains.
In this talk, I will outline three approaches, based on probabilistic modeling and inference, that explicitly address the data-efficiency challenge in reinforcement learning and robotics. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce latent-variable approach to meta learning (in the context of model-based RL) for transferring knowledge from known tasks to tasks that have never been encountered.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018

Reinforcement learning (RL) is a mathematical framework for learning from trial and error, which makes it an appealing framework for intelligent systems and autonomous learning. RL has had many success stories recently, but it is typically data hungry. In many practical situations, however, we are faced with the challenge of making decisions based on small datasets and limited experience. In this talk, I will outline approaches based on probabilistic modeling and Bayesian inference to tackle this problem.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018

Modern machine learning algorithms often require us to solve meta challenges, such as setting the learning rate in stochastic gradient descent, the margin in an SVM, or the depth of a neural network. Finding good values for these hyperparameters is often tedious and expensive as they require the full learning pipeline to be evaluated repeatedly. Therefore, optimization strategies are desirable, which keep the number of experiments small. Bayesian optimization is such an optimization framework. Bayesian optimization is a data-efficient, gradient-free, black-box optimization framework that is commonly used in production systems for optimizing hyperparameters. Bayesian optimization is closely related to Bayesian experimental design.

In this lecture, we will start with a quick re-cap on linear regression and a brief introduction to Gaussian processes, before going into some details of Bayesian optimization, such as acquisition functions, properties of the Gaussian process proxy.

In this lecture, we will start with a quick re-cap on linear regression and a brief introduction to Gaussian processes, before going into some details of Bayesian optimization, such as acquisition functions, properties of the Gaussian process proxy.

High-impact areas of machine learning and AI, such as personalized healthcare, autonomous robots, or environmental science share some practical challenges: They are either small-data problems or a small collection of big-data problems. Therefore, learning algorithms need to be data/sample efficient, i.e., they need to be able to learn in complex domains, but only from fairly small datasets. Approaches for data-efficient learning include probabilistic modeling and inference, Bayesian deep learning, meta learning, Bayesian optimization, few-shot learning, etc.
In this talk, Marc will give a brief overview of some approaches to tackle the data-efficiency challenge. First, he will discuss a data-efficient reinforcement learning algorithm, which highlights the necessity for probabilistic models in RL. He will then present a meta-learning method for generalizing knowledge across tasks. Finally, he will motivate deep Gaussian processes, richer probabilistic models, which are composed of relatively simple building blocks. He will briefly discuss the model, inference and some potential extensions, which can be valuable for modeling complex relationships, while providing some uncertainty estimates, which will be useful in any downstream decision-making process.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018 Hugh Salimbeni, Marc P. Deisenroth, Doubly Stochastic Variational Inference for Deep Gaussian Processes, Advances in Neural Information Processing Systems (NIPS), 2017 Hugh Salimbeni, Vincent Dutordoir, James Hensman, Marc P. Deisenroth, Deep Gaussian Processes with Importance-Weighted Variational Inference, International Conference on Machine Learning (ICML), 2019

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018

On our path toward fully autonomous systems, i.e., systems that operate in the real world without significant human intervention, reinforcement learning (RL) is a promising framework for learning to solve problems by trial and error. While RL has had many successes recently, a practical challenge we face is its data inefficiency: In real-world problems (e.g., robotics) it is not always possible to conduct millions of experiments, e.g., due to time or hardware constraints. In this talk, I will outline three approaches that explicitly address the data-efficiency challenge in reinforcement learning using probabilistic models. First, I will give a brief overview of a model-based RL algorithm that can learn from small datasets. Second, I will describe an idea based on model predictive control that allows us to learn even faster while taking care of state or control constraints, which is important for safe exploration. Finally, I will introduce an idea for meta learning (in the context of model-based RL), which is based on latent variables.#### Key references

Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015 Sanket Kamthe, Marc P. Deisenroth, Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control, Proceedings of the International the Conference on Artificial Intelligence and Statistics (AISTATS), 2018 Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018

To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.

Autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. We follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in challenging real robot and control tasks.
Citation: https://www.computer.org/csdl/journal/tp/2015/02/06654139/13rRUILLkEU

Autonomous learning has been a promising direction in control and robotics for more than a decade since learning models and controllers from data allows us to reduce the amount of engineering knowledge that is otherwise required. Due to their flexibility, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers. However, in real systems, such as robots, many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, pre-shaped policies, or specific knowledge about the underlying dynamics.
We follow a different approach and speed up learning by efficiently extracting information from sparse data. In particular, we learn a probabilistic, non-parametric Gaussian process dynamics model. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Bayesian inference leads to an automatic exploration/exploitation trade-off, such that our model-based policy search method achieves an unprecedented speed of learning compared to state-of-the art RL. We demonstrate its applicability to autonomous learning in real robot and control tasks.

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

We propose an analytic moment-based filter for nonlinear stochastic dynamical systems modeled by Gaussian processes. Exact expressions for the expected value and the covariance matrix are provided for both the prediction and the filter step, where an additional Gaussian assumption is exploited in the latter case. The new filter does not require further approximations. In particular, it avoids sample approximations. We compare the filter to a variety of available Gaussian filters, such as the EKF, the UKF, and the GP-UKF recently proposed by Ko et al. (2007).