Courses

What courses can I take and how is the whole curriculum structured?

The curriculum is split into courses, and the courses are split into topics, listed below. For example, “Mathematical background for machine learning” is a course and “Linear algebra for machine learning“ is a topic. It is possible to take a given topic without taking the whole course.

What is the structure and timing of the curriculum?

Even though individual courses are cohort-based, the curriculum is structured extremely flexibly. It’s possible to take just one course/topic at a time. It’s possible to take more courses/topics at once. It’s possible to pause your studies for some time and join again when you are less busy. And it’s possible to study intensively and get to your goal quickly.

Of course, not all courses/topics are offered simultaneously. The timing of each course/topic is determined by students’ requests. Courses on more popular subjects are offered more frequently than those that interest a smaller number of students.

What about graduation?

There is no concept of graduation, so you do not have to complete any required part of the curriculum. It’s possible to start learning more intensively, achieve your immediate goal, such as getting a job or getting into a graduate school, and then continue taking our courses less intensively to achieve mastery in more subjects. You will receive guidance on what subjects are good to know to achieve your immediate goal.

What is the difficulty level of the courses? Are there any prerequisites?

Most of the courses are at the level of graduate schools or the level of professional machine-learning engineers. Their content has been designed based on experience with many learners, including university students and machine-learning engineers in corporate settings.

But it is not a problem if you are not familiar with calculus, linear algebra, and/or coding. If you are willing to make effort, you can learn the necessary skills as you take our introductory courses.

What is the length of the class meetings?

Learning each topic is supported by a series of live (now online) lectures / class meetings. Each class meeting is 60-65 minutes long, to keep your mind fresh. That means you can join our school even if you have very little free time. If you have more time, you can of course join more meetings per day or per week. Our platform will allow you to do a lot of work outside of the class meetings, to make sure you master each topic.

List of topics

This list is in no particular order, so finding a given topic may require some scrolling.

Neural network introduction
- A first introduction to neural networks
- An overview of types of neural networks and their applications
Machine learning introduction
- A first introduction to machine learning methods
Data manipulation
- Data cleaning, transformation, and standardization
- Identification of outliers in the data
- Data visualization
- Data version control
Mathematical background for machine learning
- Multivariate calculus for machine learning
- Linear algebra for machine learning
Backpropagation for training neural networks
- Automatic differentiation and the backpropagation algorithm
Neural network initialization and component normalization
- Neural network weight initialization
- Batch normalization
- Batch, layer, instance, and group normalizations and their conditional and adaptive versions
Model regularization, underfitting, and overfitting
- Generalized linear models and regularization
- Regularization methods for neural networks
Maximizing the performance of models trained with limited data
- Data augmentation techniques
- Semi-supervised learning for computer vision
Hyperparameter optimization
- Simple methods for hyperparameter optimization
- Bayesian methods for hyperparameter optimization
Loss functions for machine learning
- Designing loss functions for optimal training of neural networks
Machine learning performance metrics and loss functions
- Performance metrics and loss functions for classification, regression, computer vision, time-series prediction
- Designing one's own ML performance metric to reflect specific engineering and business objectives
Optimization
- Optimizers for training neural networks
- Constrained optimization using Lagrange multipliers
- Convex and concave function optimization
- Submodular and supermodular function
Probability theory
- Discrete, continuous, and mixed distributions
- Expectations and moments of distributions
- Conditional distributions and Bayesian statistics
Statistical testing
- Hypothesis testing
- Multiple hypothesis testing
- Confusing aspects of hypothesis testing
Regression analysis
- Introduction to types of regressions and their uses
- Regressions under homoscedasticity
- Regressions under heteroscedasticity
Maximum likelihood estimation
Discrete-outcome statistical models
Causal inference
- Causal inference problems and directed acyclic graphs
- Causal do-calculus
- Treatment effect estimation
- Instrumental variable methods
Entropy and information-theoretic concepts for machine learning
- Entropy types and their uses for machine learning
- Divergences between distributions and their uses for machine learning
Decision trees
- Decision trees and random forests
- Gradient boosted decision trees
Computer vision
- Introduction to convolutional neural networks (CNNs) for computer vision
- Object classification using CNNs
- Transposed convolutions in CNNs and checkerboard artifacts
- Object detection and image segmentation CNNs, including region-based CNNs such as Fast R-CNN or Mask R-CNN
- Depth-wise separable convolutions in CNNs
- Networks utilizing depthwise separable convolutions, including EfficientNet and EfficientDet
- Spatial transformer networks
- Computer vision using self-supervised learning
- Computer vision for video data processing
- Computer vision using transformer neural networks
- Adversarial attacks and possible defenses against them
- Image similarity search including Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) with Product Quantization (PQ)
- Interpretation of computer-vision models and perceptions of content and style
- Computer vision using transformer neural networks, including combinations of CNNs and transformers
- Natural language supervision for computer vision models including the CLIP objective
- Contrastive learning methods for self-supervised learning in computer vision
- Masked image modeling for self-supervised learning in computer vision
- Computer vision for video data processing
- Depth estimation using supervised learning
- Depth estimation models trained by self-supervised learning using video data
- Neural radiance field models and Gaussian splatting
Recurrent neural networks (RNN)
- Recurrent neural networks and related simple statistical models
- Recurrent neural networks with memory cells
Transformer neural networks
- Intuition behind the query-key-value scaled dot-product attention mechanism
- Intuition behind positional encoding in transformers
- Transformer neural networks for computer vision
- Transformer neural networks for image generation
- Transformer neural networks for sequence processing including natural language processing
Large language models (LLM), natural language processing (NLP), and code processing
- Introduction to natural language processing
- Introduction to code processing similar to natural language processing
- Language models
- Natural language processing before transformers
- Word embeddings
- Natural language processing using transformers
- Parameter-efficient fine-tuning, including LoRA
- Reinforcement learning with human feedback (RLHF) and alternatives to it
- NLP in combination with computer vision or image generation
- Cloud deployment of large language models
Time-series analysis
- Time-series prediction for tabular data
- Time-series prediction for image/video data
- Autoregression, moving-average models, and their combinations
- Time-series process stationarity, non-stationarity, and integration
- Time-series models with seasonality
- Spatio-temporal models
Optimizing neural networks for edge device deployment
- Assessing tradeoffs in speed, memory requirements, and computational cost
- Neural network distillation, weight quantization
- Software for edge-device or embedded system deployment
High-dimensional spaces
- Properties of high-dimensional statistical distributions
- Data manifolds
- Dimensionality reduction
Variational autoencoders (VAE)
- Basic variational autoencoders
- High-performance variational autoencoders
- Variational autoencoders for anomaly detection
Diffusion models
- Intuition behind diffusion models, forward and reverse processes
- Explicit direct sampling from the forward process
- Explicit forward process posteriors conditioned also on the input image
- Variational bounds for diffusion models
- Variance reduction in diffusion models by an explicit evaluation of expressions in the variational bound
- Cascaded diffusion models
- Accelerated sampling from diffusion models
- Diffusion models for video generation
Generative adversarial networks (GAN)
- Basic generative adversarial networks
- Generative adversarial networks and spectral normalization
Graph neural networks
- Graph data
- Graph neural network layers
- Graph representation learning
- Graph convolutions (non-spectral and spectral)
- Graph neural networks with attention
- Graph neural networks with general message passing
- Graph neural networks for recommendation systems
- Graph neural networks for geospatial data
Reinforcement learning
- Bandit problems, exploration vs. exploitation
- Markov decision processes and value functions
- Monte Carlo methods
- Dynamic programming
- Temporal difference learning
- Importance sampling
- Function approximation for reinforcement learning
- Value-based methods, including Q-learning and SARSA
- Policy gradient methods and actor-critic methods, including GAE, A3C, and DDPG
- Eligibility traces, lambda returns, TD(lambda), SARSA(lambda)
- Trust Region Policy Optimization (TRPO), ACKTR
- Proximal Policy Optimization (PPO)
- Model-based reinforcement learning
- Inverse reinforcement learning
Machine-learning system development strategies and pipelines
- Strategies to improve performance on training, validation, test, and inference sets
- Strategies for handling edge cases (long tail events)
- Strategies for domain adaptation
- Model A/B testing and progressive delivery
- Machine-learning pipelines for continuous integration, continuous delivery, and continuous training
- Monitoring model performance and detecting and managing data drift
Security and privacy
- Anonymization of data for privacy-preserving model training
- Generation of artificial data that follows the same distribution as a confidential dataset
- Public-key and symmetric-key cryptography
- Homomorphic encryption
- Differential privacy
- Federated learning
Python language
- Python object types, keywords, and operators
- Comprehensions in Python
- Object-oriented programming with Python
- Libraries for data transformations and visualizations in Python
- Libraries for data transformations and visualizations in Python using GPUs
- Web/cloud deployment of Python-based projects
PyTorch deep learning framework
- PyTorch variables, functions and automatic differentiation
- GPU computing with PyTorch
- Libraries built on PyTorch
TensorFlow deep learning framework
- TensorFlow variables, functions, and automatic differentiation
- GPU computing with TensorFlow
- Libraries built on TensorFlow
Virtual environments, Docker, and Kubernetes
- Virtualenv and Conda environments
- Single-container and multi-container Docker apps
- Kubernetes
Spark
- Using Apache Spark for distributed data processing
Notebooks for machine learning
- Using Jupyter notebooks for experimenting and for preparing production-quality code