Skip to content

Syllabus

Important

The content of this syllabus is subject to change. Please consistently check the course page on Blackboard and the ADA University Academic Calendar for modifications. The last day of the add/drop period, holidays, and other academic deadlines are noted in the calendar.

Info

Square brackets in the Assessment / Notes column indicate the range of classes whose material is covered by the assessment. For example, Quiz 1 [1–3] means that the quiz assesses material covered in classes 1 through 3.

Class Topic Learning Outcomes Assessment Materials
1 Deep Learning Overview / Course Structure Describe the scope of deep learning and the course syllabus. Fulfill technological requirements. introduction/01_deep_learning
2 Mathematics of Deep Learning: Calculus / Linear Algebra Compute partial derivatives and apply the chain rule. Work with vectors, matrices, and tensors; apply norms and inner products. Understand intuition behind eigenvectors and SVD. mathematics/01_calculus
mathematics/02_linear_algebra
supplementary/svd
3 Gradient Descent / Backpropagation I Compute gradients on computational graphs. Perform forward and backward passes. Understand gradient descent updates and automatic differentiation (PyTorch autograd, micrograd). notebooks/01_backprop
4 Gradient Descent / Backpropagation II Implement full backpropagation. Feb 3: Quiz 1 [1–3]
Last day to submit team member details
notebooks/01_backprop
5 Activation Functions / Neuron Implement activation functions and understand non-linearity. Backpropagate over an N-dimensional neuron. notebooks/02_neural_network
6 Multilayer Perceptron (MLP) / Cross-Entropy Construct an MLP from stacked neurons. Train a simple MLP classifier on a small dataset. Understand cross-entropy loss. Feb 10: Project proposal deadline notebooks/02_neural_network
7 Images as Tensors / Training MLP on MNIST Dataset Understand image representations, tensor shapes, and batching. Use torchvision datasets and dataloaders. Train an MLP on MNIST. Feb 12: Quiz 2 [5–6] notebooks/03_cnn_torch
8 Convolutional Neural Networks (CNN) Define and implement 2D cross-correlation (convolution) and pooling with kernels, including padding and stride. Train a LeNet-style CNN on MNIST. Compare MLP with CNN. notebooks/03_cnn_torch
9 Mathematics of Deep Learning: Probability Theory Describe random variables; distinguish discrete and continuous distributions; work with PMF/PDF. Compute expectation, variance, and covariance. Use conditional probability, independence, and Bayes’ rule. Recognize common distributions. Feb 19: Quiz 3 [7–8] mathematics/03_probability
10 Regularization / Initialization Recall overfitting and understand how regularization helps with it. Apply data augmentation. Apply weight decay and dropout. Handle exploding and vanishing gradients. Use Xavier and He initialization. notebooks/04_regul_optim
11 Optimization Distinguish local minima from saddle points in training dynamics. Adjust learning rate and apply schedules. Use stochastic gradient decent (SGD) and explain its purpose. Apply momentum, RMSProp, and Adam to optimize learning. Compare optimizers based on convergence behavior and practical performance. notebooks/04_regul_optim
12 Training CNN on CIFAR-10 Dataset / Hyperparameter Tuning Train a regularized and optimized CNN on CIFAR-10. Apply hyperparameter tuning. Mar 3: Quiz 4 [10–11] notebooks/04_regul_optim
13 CNN Architectures / Batch Normalization Compare classical and modern CNN architectures (e.g. AlexNet, VGG, Inception, EfficientNet) in terms of depth, parameter count, and training purpose and stability. Understand how receptive field influence predictions. Explain why normalization helps training deep networks. Implement batch normalization and understand training vs evaluation behavior. Understand batch-size effects and when to prefer layer normalization. notebooks/05_cnn_architectures
[ tensorspace.js ]
14 Residual Network / Transfer Learning / Fine-tuning Understand residual (skip) connections. Explain the deep reasoning why residual blocks help training big models. Describe how pretrained CNNs enable transfer learning, and how fine-tuning adapts them to new datasets. Demonstrate fine-tuning of an ImageNet-pretrained CNN on CIFAR-10. notebooks/05_cnn_architectures
15 Paper Reading: AlexNet Discuss AlexNet paper, its key ideas, and determine what is outdated. Describe the paper structure. Mar 12: Quiz 5 [13-14]
Project milestone 1 deadline
[ alexnet ]
16 Midterm Exam Mar 17: Midterm Exam [1–15]
17 Midterm Exam Review Half-semester overview.
Holidays Mar 20–30
18 Mathematics of deep learning: Information Theory and Probabilistic Modeling Compute entropy, cross-entropy, and KL divergence. Derive cross-entropy loss from maximum likelihood. Interpret common losses as probabilistic objectives. mathematics/04_information
mathematics/05_prob_modeling
[ probabalistic models ]
19 Sequence Modeling: Tokenization / Bigram Model / Perplexity Understand the aims of sequence modeling. Tokenize and build a character-level bigram model and sample from it. Implement average negative log-likelihood loss and perplexity. notebooks/06_nn_ngram
20 Neural N-gram Language Model Construct a neural N-gram model and train it with mini-batch updates. Apr 7: Quiz 6 [18–19] notebooks/06_nn_ngram
21 Autoregressive Modeling: RNN / LSTM Explain autoregressive modeling. Describe how RNNs maintain state. Implement RNN and LSTM and identify their limitations.
22 Attention Mechanism Understand attention as weighted information selection. Derive queries, keys, and values at the tensor level. Implement attention with matrix operations and verify shapes and normalization. Apr 14: Quiz 7 [20–21]
23 Transformer Architecture / Self-Attention Explain self-attention and the motivation for Transformer models. Describe token embeddings and positional encoding. Explain multi-head attention and how attention heads capture different relationships. Assemble a Transformer block from self-attention, normalization, residual connections, and feed-forward layers. Trace tensor shapes through the model and discuss scaling behavior and training stability.
24 Paper Reading: Transformer, Vision Transformer, Swin Transformer Extract core architectural ideas and compare attention for sequences vs images. Discuss scalability and efficiency constraints. Apr 21: Quiz 8 [22–23]
25 Variational Autoencoders I Introduce latent-variable generative models. Explain latent representations and probabilistic encoders/decoders. Explain approximate inference and why variational methods are needed. notebooks/notebooks/07_vae
26 Variational Autoencoders II Understand the VAE objective (ELBO). Implement a VAE. Interpret reconstruction and regularization terms and their trade-off. Apr 28: Project milestone 2 deadline notebooks/notebooks/07_vae
27 Generative Adversarial Networks Explain adversarial training between a generator and discriminator. Formulate the GAN objective as a minimax game. Implement a basic GAN and examine training dynamics. Analyze common failure modes such as mode collapse and instability, and discuss stabilization techniques (e.g. normalization)
28 Diffusion Models Introduce diffusion as generative modeling via gradual noise corruption and learned denoising. Describe the forward noising process and the reverse denoising model. Derive the training objective and connect it to score-based learning. Explain sampling as iterative probabilistic inference and discuss computational trade-offs and scalability. May 5: Quiz 9 [25–27]
29 Foundation Models and Modern Trends Explain large-scale pretraining and transfer learning. Examine GPT, BERT, CLIP, and latent diffusion models (LDMs). Discuss scaling behavior and limitations.
Final Exam Tuesday, May 12: Final Exam [1–29]