CS 6890: Deep Learning

Spring 2020

This course will introduce the

Logistic and Softmax Regression, Feed-Forward Neural Networks, Backpropagation, Vectorization, PCA and Whitening, Deep Networks, Convolution and Pooling, Recurrent Neural Networks, Long Short-Term Memory, Gated Recurrent Units, Neural Attention Models, Sequence-to-Sequence Models, Distributional Representations, Variational Auto-Encoders, Generative Adversarial Networks, Deep Reinforcement Learning.

Previous exposure to basic concepts in machine learning, such as: supervised vs. unsupervised learning, classification vs. regression, linear regression, logistic and softmax regression, cost functions, overfitting and regularization, gradient-based optimization. Experience with programming and familiarity with basic concepts in linear algebra and statistics.

- Syllabus & Introduction
- Hand notes Jan 14.

- Linear Regression, Logistic Regression, and Vectorization
- Gradient Descent algorithms
- An overview of gradient descent optimization algorithms, Sebastian Ruder, CoRR 2016

- Linear algebra and optimization in NumPy and PyTorch
- Hand notes Jan 28.
- Tutorials on NumPy and SciPy.
- Broadcasting explained.

- NumPy/SciPy examples and NumPy session on Jan 23.
- PyTorch examples and linear regression in Jupyter Notebook.

- Feed-Forward Neural Networks and Backpropagation
- Andrej Karpathy: Yes you should understand backprop.
- Unsupervised Feature Learning with Autoencoders
- Introduction to Automatic Differentiation, invited lecture by Dr. David Juedes.
- PCA, PCA whitening, and ZCA whitening
- Convolutional Neural Networks
- Andrej Karpathy's notes on CS231n: Convolutional Neural Networks for Visual Recognition.
- UFLDL Tutorial at Stanford.

- Word Embeddings
- Natural Language Processing (Almost) from Scratch, Collobert, Weston, Bottou, Karlen, Kavukcuoglu, and Kuksa, JMLR 2011.
- Distributed Representations of Words and Phrases and their Compositionality, Mikolov, Sutskever, Chen, Corrado, and Dean, NIPS 2013.
- Character-Aware Neural Language Models, Kim et al., AAAI 2016.

- Recurrent Neural Networks
- Stanford CS 224N slides
- Hand notes on LSTM equations
- Supervised Sequence Labelling with Recurrent Neural Networks, Alex Graves, PhD Thesis 2012.
- Chapter 4.6: Forward and Backward Propagation equations for LSTMs.

- Multilayer LSTM for predicting sine: Model implementation , Notebook for training and evaluation, and the data
- Hand notes on LSTM model for time series prediction

- RNNs with Attention for Machine Translation
- Stanford CS 224N slides
- Hand notes on Copying mechanism

- From RNNs to Transformer
- ELMo: Deep contextualized word representations, Peters et al., NAACL 2018.
- Transformer: Attention is all you need, Vaswani et al., NIPS 2017.
- Hand notes on Vectorized attention, Self attention, and Encoder module
- The Annotated Transformer and ACL 2018 paper by Alexander Rush.
- The Illustrated Transformer by Jay Alammar.
- Stanford CS 224N slides

- GPT: Improving Language Understanding by Generative Pre-Training, Radford et al., OpenAI 2018.
- Open AI Blog first entry on GPT and follow up.

- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL 2019.
- Jacob Devlin's presentation.
- Google AI Blog entry on BERT.
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, Lan et al., 2019.

- Deep Generative Models
- Generative Adversarial Networks @ Toronto.
- Generative Adversarial Nets, Goodfellow et al., NIPS 2014.
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Radford et al., ICLR 2016.
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Zhu et al., ICCV 2017.

- Autoregressive and Reversible Models @ Toronto.
- The Reversible Residual Network: Backpropagation Without Storing Activations, Gomez et al., NIPS 2017.

- Variational Auto-Encoders @ Toronto.
- Auto-Encoding Variational Bayes, Kingma and Welling, ICLR2014
- Tutorial on Variational Autoencoders, Carl Doersch, CMU 2016
- VEA implementation in PyTorch, Agustinus Kristiadi's Blog, 2017

- Generative Adversarial Networks @ Toronto.

- Assignment and code.
- Assignment and code.
- Assignment, code and data.
- Assignment, code and data.
- Assignment, code, word2vec Google News embeddings, and the Stanford Natural Language Inference (SNLI) dataset.
- Reasoning about entailment with neural attention, Rocktaschel et al., ICLR 2016.

- Tips for choosing a project topic:
- Project suggestions
- Project report guidelines

- Petersen et al.'s The Matrix Cookbook
- James H. Martin's Introduction to probabilities
- Jason Eisner's equestrian Introduction to probabilities
- Gilbert Strang's Introduction to Linear Algebra
- Strang's Video Lectures on Linear Algebra
- Inderjit Dhillon's Linear Algebra Background
- Convex Optimization, Stephen Boyd and Lieven Vandenberghe, Cambridge University Press 2004