Foundations of Machine Learning (2018/19)

African Institute for Mathematical Sciences, Kigali, Rwanda

This course was part of the African Masters in Machine Intelligence (AMMI) at the African Institute for Mathematical Sciences (AIMS), Rwanda.

Part 1: Mathematical Foundations

  • Linear Algebra (MML book chapter 2)
    • Groups
    • Vector spaces
    • Linear independence
    • Basis
    • Coordinate representation
    • Basis change
    • Linear mappings
  • Analytic Geometry (MML book chapter 3)
    • Eigenvalues
    • Norms and inner products
    • Distances and angles
    • Orthogonal projections
  • Vector Calculus (slides, MML book chapter 5)
    • Scalar differentiation
    • Partial derivatives
    • Jacobian
    • Chain rule
    • Derivatives of matrices w.r.t. matrices
    • Gradients in a multi-layer neural network
  • Statistics and Probability Theory (slides, MML book chapter 6)
    • Statistics to describe datasets: means, variances, covariances, medians
    • Basic probability distributions: Bernoulli, Binomial, Beta, Gaussian, Gamma
    • Parameter estimation (maximum likelihood, MAP estimation)
    • Key concepts in probability theory
  • Optimization (MML book chapter 7)
    • Gradient descent
    • Stochastic gradient descent
    • Momentum
    • Constrained optimization

Part 2: Machine Learning

  • Graphical Models (slides, Chris Bishop’s book chapter)
    • Directed graphical models
    • Undirected graphical models
    • D-separation
  • Dimensionality Reduction with Principal Component Analysis (slides, MML book chapter 10)
    • Maximum variance perspective
    • Projection perspective
    • Key steps of PCA in practice
    • Probabilistic PCA
    • Other perspectives of PCA
  • Linear Regression (slides, MML book chapter 9)
    • Maximum likelihood estimation
    • Maximum a posteriori estimation
    • Bayesian linear regression
    • Distribution over functions
  • Model Selection (slides, MML book chapter 8)
    • Cross validation
    • Information criteria
    • Bayesian model selection
    • Occam’s razor and the marginal likelihood
  • Gaussian Process Regression (slides, GPML book)
    • Model
    • Inference with Gaussian processes
    • Training via evidence maximization
    • Model selection
    • Interpreting the hyper-parameters
    • Practical tips and tricks when working with Gaussian processes
  • Bayesian Optimization (slides)
    • Optimization of meta-parameters in machine learning systems
    • Acquisition functions
    • Practicalities
    • Applications
  • Sampling (slides)
    • Monte Carlo estimation
    • Importance sampling
    • Rejection sampling
    • Metropolis Hastings
    • Slice sampling
    • Gibbs sampling
  • Density Estimation with Gaussian Mixture Models (slides, MML book chapter 11)
    • Mixture models
    • Parameter estimation
    • Implementation
    • Latent variable perspective
  • Classification with Logistic Regression (slides)
    • Logistic sigmoid and as a posterior class probability
    • Implicit modeling assumptions
    • Maximum likelihood estimation
    • MAP estimation
    • Probabilistic model
    • Laplace approximation
    • Bayesian logistic regression
  • Information Theory (slides by Pedro Mediano)
    • Entropy
    • KL divergence
    • Mutual information
    • Coding theory
    • Information theory and statistical inference
  • Variational Inference (slides)
    • Inference as optimization
    • Evidence lower bound
    • Conditionally conjugate models
    • Mean-field variational inference in conditionally conjugate models
    • Black-box variational inference for hierarchical Bayesian models
    • Gradient estimators
    • Amortized inference
    • Richer posteriors

Practicals

References

Team

  • Marc Deisenroth (Lecturer)
  • Kossi Amouzouvi (Tutor, AIMS Rwanda)
  • Oluwafemi Azeez (Tutor, CMU Africa)
  • Steindór Sæmundsson (Tutor, Imperial College London)
  • Pedro Martinez Mediano (Tutor, Imperial College London)