Modern machine learning algorithms often require us to solve meta challenges, such as setting the learning rate in stochastic gradient descent, the margin in an SVM, or the depth of a neural network. Finding good values for these hyperparameters is often tedious and expensive as they require the full learning pipeline to be evaluated repeatedly. Therefore, optimization strategies are desirable, which keep the number of experiments small. Bayesian optimization is such an optimization framework. Bayesian optimization is a data-efficient, gradient-free, black-box optimization framework that is commonly used in production systems for optimizing hyperparameters. Bayesian optimization is closely related to Bayesian experimental design.In this lecture, we will start with a quick re-cap on linear regression and a brief introduction to Gaussian processes, before going into some details of Bayesian optimization, such as acquisition functions, properties of the Gaussian process proxy.