CS 4900/5900: Machine Learning

Fall 2019

This course will give an overview of the main concepts, techniques, and algorithms underlying the theory and practice of machine learning. The course will cover the fundamental topics of classification, regression and clustering, and a number of corresponding learning models such as perceptrons, logistic regression, linear regression, Naive Bayes, nearest neighbors, and Support Vector Machines. The description of the formal properties of the algorithms will be supplemented with motivating applications in a wide range of areas including natural language processing, computer vision, bioinformatics, and music analysis.

The students are expected to be comfortable with programming and familiar with basic concepts in linear algebra and statistics. Relevant background material in linear algebra, probability theory and information theory will be made available during the course.

- Syllabus & Introduction
- Linear Regression and L2 Regularization
- Linear algebra and optimization in Python
- Python lecture
- NumPy tutorial
- SciPy tutorial
- NumPy/SciPy examples
- Matplotlib tutorial and examples:
- Visualization in 3D of a non-convex surface.
- Athens houses visualization code.

- Gradient Descent Algorithms
- An overview of gradient descent optimization algorithms, Sebastian Ruder, CoRR 2016
- Animations of Gradient Descent Algorithms, Alec Radford, 2014

- Logistic Regression, Maximum Likelihood, Maximum Entropy
- Hand notes Sep 13 one, Sep 13 two.
- A Maximum Entropy Approach to Natural Language Processing, Adam Berger, Vincent Della Pietra and Stephen A. Della Pietra, Computational Linguistics, 1996

- Fisher Linear Discriminant
- Perceptrons and Kernels
- Large Margin Classification Using the Perceptron Algorithm, Yoav Freund and Robert E. Schapire, M achine Learning 1999
- New ranking algorithms for parsing and tagging: Kernels over Discrete Structures, and the Voted Perceptron, Michael Collins and Nigel Duffy, ACL 2002

- Support Vector Machines
- Derivation of dual formulation of SVMs.
- A Tutorial on Support Vector Machines for Pattern Recognition, Christopher J. C. Burges, Data Mining and Knowledge Discovery 1998

- Nearest Neighbor Methods
- Naive Bayes
- Naive Bayes and Logistic Regression, chapter in Tom Mitchell, Machine Learning, 2017

- Clustering
- Clustering algorithms in scikit-learn.

- Decision Trees
- Classification And Regression Trees (CART), slides by Alexandra Chouldechova @ CMU.
- Decision trees in scikit-learn.

- Bias-Variance Decomposition, Bagging, and Boosting, slides by Tom Dietterich @ Oregon State.
- Ensemble methods in scikit-learn.

- Classification And Regression Trees (CART), slides by Alexandra Chouldechova @ CMU.
- Reinforcement Learning
- Hand notes, Quintin: Sep 17.
- Hand notes, Razvan: one, two, three, cheat sheet.
- RL 1 (MDP) and RL 2 (RL) lectures from the Udacity ML course.
- First two lectures from the Deep RL Bootcamp.
- Part I from Richard Suttons's RL Tutorial at NIPS 2015.

- Assignment 1
- Assignment 2
- Assignment 3
- Assignment 4
- Assignment 5
- Assignment 6
- Assignment 7
- Assignment 8
- Assignment 9

- James H. Martin's Introduction to probabilities
- Jason Eisner's equestrian Introduction to probabilities
- Gilbert Strang's Introduction to Linear Algebra
- Strang's Video Lectures on Linear Algebra
- Inderjit Dhillon's Linear Algebra Background
- Convex Optimization, Stephen Boyd and Lieven Vandenberghe, Cambridge University Press 2004
- Mike Brookes' Matrix Reference Manual
- Petersen et al.'s The Matrix Cookbook

- scikit-learn Machine Learning in Python
- Weka Data Mining Software in Java
- SVM
Implementation of SVMs in C^{light} - LIBSVM Implementation of SVMs in C++ and Java
- MALLET Java implementations of logistic regression, HMMs, linear chain CRFs, and other ML models.
- LibSVM applet demonstrating SVMs.
- k-Nearest Neighbor short animated video, by Antal van den Bosch