Syllabus

Important

The content of this syllabus is subject to change. Please consistently check the course page on Blackboard and the ADA University Academic Calendar for modifications. The last day of the add/drop period, holidays, and other academic deadlines are noted in the calendar.

Info

Square brackets in the Assessment / Notes column indicate the range of classes whose material is covered by the assessment. For example, Quiz 1 [1–3] means that the quiz assesses material covered in classes 1 through 3.

Class	Topic	Learning Outcomes	Assessment	Materials
1	Deep Learning Overview / Course Structure	Describe the scope of deep learning and the course syllabus. Fulfill technological requirements.		introduction/01_deep_learning
2	Mathematics of Deep Learning: Calculus / Linear Algebra	Compute partial derivatives and apply the chain rule. Work with vectors, matrices, and tensors; apply norms and inner products. Understand intuition behind eigenvectors and SVD.		mathematics/01_calculus mathematics/02_linear_algebra supplementary/svd
3	Gradient Descent / Backpropagation I	Compute gradients on computational graphs. Perform forward and backward passes. Understand gradient descent updates and automatic differentiation (PyTorch autograd, micrograd).		notebooks/01_backprop
4	Gradient Descent / Backpropagation II	Implement full backpropagation.	Feb 3: Quiz 1 [1–3] Last day to submit team member details	notebooks/01_backprop
5	Activation Functions / Neuron	Implement activation functions and understand non-linearity. Backpropagate over an N-dimensional neuron.		notebooks/02_neural_network
6	Multilayer Perceptron (MLP) / Cross-Entropy	Construct an MLP from stacked neurons. Train a simple MLP classifier on a small dataset. Understand cross-entropy loss.	Feb 10: Project proposal deadline	notebooks/02_neural_network
7	Images as Tensors / Training MLP on MNIST Dataset	Understand image representations, tensor shapes, and batching. Use torchvision datasets and dataloaders. Train an MLP on MNIST.	Feb 12: Quiz 2 [5–6]	notebooks/03_cnn_torch
8	Convolutional Neural Networks (CNN)	Define and implement 2D cross-correlation (convolution) and pooling with kernels, including padding and stride. Train a LeNet-style CNN on MNIST. Compare MLP with CNN.		notebooks/03_cnn_torch
9	Mathematics of Deep Learning: Probability Theory	Describe random variables; distinguish discrete and continuous distributions; work with PMF/PDF. Compute expectation, variance, and covariance. Use conditional probability, independence, and Bayes’ rule. Recognize common distributions.	Feb 19: Quiz 3 [7–8]	mathematics/03_probability
10	Regularization / Initialization	Recall overfitting and understand how regularization helps with it. Apply data augmentation. Apply weight decay and dropout. Handle exploding and vanishing gradients. Use Xavier and He initialization.		notebooks/04_regul_optim
11	Optimization	Distinguish local minima from saddle points in training dynamics. Adjust learning rate and apply schedules. Use stochastic gradient decent (SGD) and explain its purpose. Apply momentum, RMSProp, and Adam to optimize learning. Compare optimizers based on convergence behavior and practical performance.		notebooks/04_regul_optim
12	Training CNN on CIFAR-10 Dataset / Hyperparameter Tuning	Train a regularized and optimized CNN on CIFAR-10. Apply hyperparameter tuning.	Mar 3: Quiz 4 [10–11]	notebooks/04_regul_optim
13	CNN Architectures / Batch Normalization	Compare classical and modern CNN architectures (e.g. AlexNet, VGG, Inception, EfficientNet) in terms of depth, parameter count, and training purpose and stability. Understand how receptive field influence predictions. Explain why normalization helps training deep networks. Implement batch normalization and understand training vs evaluation behavior. Understand batch-size effects and when to prefer layer normalization.		notebooks/05_cnn_architectures [ tensorspace.js ]
14	Residual Network / Transfer Learning / Fine-tuning	Understand residual (skip) connections. Explain the deep reasoning why residual blocks help training big models. Describe how pretrained CNNs enable transfer learning, and how fine-tuning adapts them to new datasets. Demonstrate fine-tuning of an ImageNet-pretrained CNN on CIFAR-10.		notebooks/05_cnn_architectures
15	Paper Reading: AlexNet	Discuss AlexNet paper, its key ideas, and determine what is outdated. Describe the paper structure.	Mar 12: Quiz 5 [13-14] Project milestone 1 deadline	[ alexnet ]
16	Midterm Exam	—	Mar 17: Midterm Exam [1–15]
17	Midterm Exam Review	Half-semester overview.
—	Holidays	—	Mar 20–30
18	Mathematics of deep learning: Information Theory and Probabilistic Modeling	Compute entropy, cross-entropy, and KL divergence. Derive cross-entropy loss from maximum likelihood. Interpret common losses as probabilistic objectives.		mathematics/04_information mathematics/05_prob_modeling [ probabalistic models ]
19	Sequence Modeling: Tokenization / Bigram Model / Perplexity	Understand the aims of sequence modeling. Tokenize and build a character-level bigram model and sample from it. Implement average negative log-likelihood loss and perplexity.		notebooks/06_nn_ngram
20	Neural N-gram Language Model	Construct a neural N-gram model and train it with mini-batch updates.	Apr 7: Quiz 6 [18–19]	notebooks/06_nn_ngram
21	Autoregressive Modeling: RNN / LSTM	Explain autoregressive modeling. Describe how RNNs maintain state. Implement RNN and LSTM and identify their limitations.
22	Attention Mechanism	Understand attention as weighted information selection. Derive queries, keys, and values at the tensor level. Implement attention with matrix operations and verify shapes and normalization.	Apr 14: Quiz 7 [20–21]
23	Transformer Architecture / Self-Attention	Explain self-attention and the motivation for Transformer models. Describe token embeddings and positional encoding. Explain multi-head attention and how attention heads capture different relationships. Assemble a Transformer block from self-attention, normalization, residual connections, and feed-forward layers. Trace tensor shapes through the model and discuss scaling behavior and training stability.
24	Paper Reading: Transformer, Vision Transformer, Swin Transformer	Extract core architectural ideas and compare attention for sequences vs images. Discuss scalability and efficiency constraints.	Apr 21: Quiz 8 [22–23]
25	Variational Autoencoders I	Introduce latent-variable generative models. Explain latent representations and probabilistic encoders/decoders. Explain approximate inference and why variational methods are needed.		notebooks/notebooks/07_vae
26	Variational Autoencoders II	Understand the VAE objective (ELBO). Implement a VAE. Interpret reconstruction and regularization terms and their trade-off.	Apr 28: Project milestone 2 deadline	notebooks/notebooks/07_vae
27	Generative Adversarial Networks	Explain adversarial training between a generator and discriminator. Formulate the GAN objective as a minimax game. Implement a basic GAN and examine training dynamics. Analyze common failure modes such as mode collapse and instability, and discuss stabilization techniques (e.g. normalization)
28	Diffusion Models	Introduce diffusion as generative modeling via gradual noise corruption and learned denoising. Describe the forward noising process and the reverse denoising model. Derive the training objective and connect it to score-based learning. Explain sampling as iterative probabilistic inference and discuss computational trade-offs and scalability.	May 5: Quiz 9 [25–27]
29	Foundation Models and Modern Trends	Explain large-scale pretraining and transfer learning. Examine GPT, BERT, CLIP, and latent diffusion models (LDMs). Discuss scaling behavior and limitations.
—	Final Exam	—	Tuesday, May 12: Final Exam [1–29]