Courses
What courses can I take and how is the whole curriculum structured?
The curriculum is split into courses, and the courses are split into topics, listed below. For example, “Mathematical background for machine learning” is a course and “Linear algebra for machine learning“ is a topic. It is possible to take a given topic without taking the whole course.
What is the structure and timing of the curriculum?
Even though individual courses are cohort-based, the curriculum is structured extremely flexibly. It’s possible to take just one course/topic at a time. It’s possible to take more courses/topics at once. It’s possible to pause your studies for some time and join again when you are less busy. And it’s possible to study intensively and get to your goal quickly.
Of course, not all courses/topics are offered simultaneously. The timing of each course/topic is determined by students’ requests. Courses on more popular subjects are offered more frequently than those that interest a smaller number of students.
What about graduation?
There is no concept of graduation, so you do not have to complete any required part of the curriculum. It’s possible to start learning more intensively, achieve your immediate goal, such as getting a job or getting into a graduate school, and then continue taking our courses less intensively to achieve mastery in more subjects. You will receive guidance on what subjects are good to know to achieve your immediate goal.
What is the difficulty level of the courses? Are there any prerequisites?
Most of the courses are at the level of graduate schools or the level of professional machine-learning engineers. Their content has been designed based on experience with many learners, including university students and machine-learning engineers in corporate settings.
But it is not a problem if you are not familiar with calculus, linear algebra, and/or coding. If you are willing to make effort, you can learn the necessary skills as you take our introductory courses.
What is the length of the class meetings?
Learning each topic is supported by a series of live (now online) lectures / class meetings. Each class meeting is 60-65 minutes long, to keep your mind fresh. That means you can join our school even if you have very little free time. If you have more time, you can of course join more meetings per day or per week. Our platform will allow you to do a lot of work outside of the class meetings, to make sure you master each topic.
List of topics
This list is in no particular order, so finding a given topic may require some scrolling.
Neural network introduction
A first introduction to neural networks
An overview of types of neural networks and their applications
Machine learning introduction
A first introduction to machine learning methods
Data manipulation
Data cleaning, transformation, and standardization
Identification of outliers in the data
Data visualization
Data version control
Mathematical background for machine learning
Multivariate calculus for machine learning
Linear algebra for machine learning
Backpropagation for training neural networks
Automatic differentiation and the backpropagation algorithm
Neural network initialization and component normalization
Neural network weight initialization
Batch normalization
Batch, layer, instance, and group normalizations and their conditional and adaptive versions
Model regularization, underfitting, and overfitting
Generalized linear models and regularization
Regularization methods for neural networks
Maximizing the performance of models trained with limited data
Data augmentation techniques
Semi-supervised learning for computer vision
Hyperparameter optimization
Simple methods for hyperparameter optimization
Bayesian methods for hyperparameter optimization
Loss functions for machine learning
Designing loss functions for optimal training of neural networks
Machine learning performance metrics and loss functions
Performance metrics and loss functions for classification, regression, computer vision, time-series prediction
Designing one's own ML performance metric to reflect specific engineering and business objectives
Optimization
Optimizers for training neural networks
Constrained optimization using Lagrange multipliers
Convex and concave function optimization
Submodular and supermodular function
Probability theory
Discrete, continuous, and mixed distributions
Expectations and moments of distributions
Conditional distributions and Bayesian statistics
Statistical testing
Hypothesis testing
Multiple hypothesis testing
Confusing aspects of hypothesis testing
Regression analysis
Introduction to types of regressions and their uses
Regressions under homoscedasticity
Regressions under heteroscedasticity
Maximum likelihood estimation
Discrete-outcome statistical models
Causal inference
Causal inference problems and directed acyclic graphs
Causal do-calculus
Treatment effect estimation
Instrumental variable methods
Entropy and information-theoretic concepts for machine learning
Entropy types and their uses for machine learning
Divergences between distributions and their uses for machine learning
Decision trees
Decision trees and random forests
Gradient boosted decision trees
Computer vision
Introduction to convolutional neural networks (CNNs) for computer vision
Object classification using CNNs
Transposed convolutions in CNNs and checkerboard artifacts
Object detection and image segmentation CNNs, including region-based CNNs such as Fast R-CNN or Mask R-CNN
Depth-wise separable convolutions in CNNs
Networks utilizing depthwise separable convolutions, including EfficientNet and EfficientDet
Spatial transformer networks
Computer vision using self-supervised learning
Computer vision for video data processing
Computer vision using transformer neural networks
Adversarial attacks and possible defenses against them
Image similarity search including Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) with Product Quantization (PQ)
Interpretation of computer-vision models and perceptions of content and style
Computer vision using transformer neural networks, including combinations of CNNs and transformers
Natural language supervision for computer vision models including the CLIP objective
Contrastive learning methods for self-supervised learning in computer vision
Masked image modeling for self-supervised learning in computer vision
Computer vision for video data processing
Depth estimation using supervised learning
Depth estimation models trained by self-supervised learning using video data
Neural radiance field models and Gaussian splatting
Recurrent neural networks (RNN)
Recurrent neural networks and related simple statistical models
Recurrent neural networks with memory cells
Transformer neural networks
Intuition behind the query-key-value scaled dot-product attention mechanism
Intuition behind positional encoding in transformers
Transformer neural networks for computer vision
Transformer neural networks for image generation
Transformer neural networks for sequence processing including natural language processing
Large language models (LLM), natural language processing (NLP), and code processing
Introduction to natural language processing
Introduction to code processing similar to natural language processing
Language models
Natural language processing before transformers
Word embeddings
Natural language processing using transformers
Parameter-efficient fine-tuning, including LoRA
Reinforcement learning with human feedback (RLHF) and alternatives to it
NLP in combination with computer vision or image generation
Cloud deployment of large language models
Time-series analysis
Time-series prediction for tabular data
Time-series prediction for image/video data
Autoregression, moving-average models, and their combinations
Time-series process stationarity, non-stationarity, and integration
Time-series models with seasonality
Spatio-temporal models
Optimizing neural networks for edge device deployment
Assessing tradeoffs in speed, memory requirements, and computational cost
Neural network distillation, weight quantization
Software for edge-device or embedded system deployment
High-dimensional spaces
Properties of high-dimensional statistical distributions
Data manifolds
Dimensionality reduction
Variational autoencoders (VAE)
Basic variational autoencoders
High-performance variational autoencoders
Variational autoencoders for anomaly detection
Diffusion models
Intuition behind diffusion models, forward and reverse processes
Explicit direct sampling from the forward process
Explicit forward process posteriors conditioned also on the input image
Variational bounds for diffusion models
Variance reduction in diffusion models by an explicit evaluation of expressions in the variational bound
Cascaded diffusion models
Accelerated sampling from diffusion models
Diffusion models for video generation
Generative adversarial networks (GAN)
Basic generative adversarial networks
Generative adversarial networks and spectral normalization
Graph neural networks
Graph data
Graph neural network layers
Graph representation learning
Graph convolutions (non-spectral and spectral)
Graph neural networks with attention
Graph neural networks with general message passing
Graph neural networks for recommendation systems
Graph neural networks for geospatial data
Reinforcement learning
Bandit problems, exploration vs. exploitation
Markov decision processes and value functions
Monte Carlo methods
Dynamic programming
Temporal difference learning
Importance sampling
Function approximation for reinforcement learning
Value-based methods, including Q-learning and SARSA
Policy gradient methods and actor-critic methods, including GAE, A3C, and DDPG
Eligibility traces, lambda returns, TD(lambda), SARSA(lambda)
Trust Region Policy Optimization (TRPO), ACKTR
Proximal Policy Optimization (PPO)
Model-based reinforcement learning
Inverse reinforcement learning
Machine-learning system development strategies and pipelines
Strategies to improve performance on training, validation, test, and inference sets
Strategies for handling edge cases (long tail events)
Strategies for domain adaptation
Model A/B testing and progressive delivery
Machine-learning pipelines for continuous integration, continuous delivery, and continuous training
Monitoring model performance and detecting and managing data drift
Security and privacy
Anonymization of data for privacy-preserving model training
Generation of artificial data that follows the same distribution as a confidential dataset
Public-key and symmetric-key cryptography
Homomorphic encryption
Differential privacy
Federated learning
Python language
Python object types, keywords, and operators
Comprehensions in Python
Object-oriented programming with Python
Libraries for data transformations and visualizations in Python
Libraries for data transformations and visualizations in Python using GPUs
Web/cloud deployment of Python-based projects
PyTorch deep learning framework
PyTorch variables, functions and automatic differentiation
GPU computing with PyTorch
Libraries built on PyTorch
TensorFlow deep learning framework
TensorFlow variables, functions, and automatic differentiation
GPU computing with TensorFlow
Libraries built on TensorFlow
Virtual environments, Docker, and Kubernetes
Virtualenv and Conda environments
Single-container and multi-container Docker apps
Kubernetes
Spark
Using Apache Spark for distributed data processing
Notebooks for machine learning
Using Jupyter notebooks for experimenting and for preparing production-quality code