CS 6840: Natural Language Processing

CS 6840: Natural Language Processing
Fall 2019

Time and Location: Tue, Thu 9:00 – 10:20am, ARC 159
Instructor: Razvan Bunescu
Office: Stocker 341
Office Hours: Tue, Thu 3:00 – 4:00pm, or by email appointment
Email: bunescu @ ohio edu

Recommended Supplementary Text (PDF available online):

Natural Language Processing, by Jacob Eisenstein. 2019.

Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2019.

Course description:
Natural Language Processing (NLP) is a branch of Artificial Intelligence concerned with developing computer systems that can process or generate natural language. Major applications of NLP include machine translation, sentiment analysis, speech recognition, information retrieval / web search engines, question answering, or information extraction. In this course, students will learn how to use modern machine learning (ML) techniques to solve fundamental NLP tasks, such as training vector-based representations of words and their meanings, document classification, syntactic parsing, language modeling, coreference resolution, entity linking, and semantic parsing.

Prerequisites:
Students are expected to be comfortable with programming in Python and have basic knowledge of formal languages (regular and context free grammars), linear algebra, probability theory and statistics. Knowledge of deep learning will be very useful, though not strictly necessary as long as the student is willing to learn. Each ML model will be introduced in class and relevant supplemental online materials will be provided throughout the course.

Lecture notes:

Syllabus & Introduction
Text Classification with Perceptron, SVMs, and Logistic Regression
Gradient Descent Algorithms
- An overview of gradient descent optimization algorithms, Sebastian Ruder, CoRR 2016
- Animations of Gradient Descent Algorithms, Alec Radford, 2014
Linear algebra and optimization in Python
- Python lecture
- NumPy tutorial
- SciPy tutorial
- NumPy/SciPy examples
- Matplotlib tutorial and examples:
  - Visualization in 3D of a non-convex surface.
HMMs and Part of Speech Tagging
- Markov Models, chapter 9 in Chris Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA, May 1999
- Part-of-Speech Tagging Guidelines for the Penn Treebank Project, Beatrice Santorini, Technical Report, 1990
- A Maximum Entropy Model for Part-Of-Speech Tagging, Adwait Ratnaparkhi, EMNLP, 1996
CRFs and Named Entity Recognition
- Hand notes Sep 26 one, Sep 26 two.
- An Introduction to Conditional Random Fields for Relational Learning, Charles Sutton and Andrew McCallum, 2007
Non-linear Classification, Neural Networks, and PyTorch
- Hand notes Oct 8 and Oct 10.
- PyTorch examples.
- Linear regression with gradient descent in PyTorch.
Word Embeddings
- Natural Language Processing (Almost) from Scratch, Collobert, Weston, Bottou, Karlen, Kavukcuoglu, and Kuksa, JMLR 2011.
- Distributed Representations of Words and Phrases and their Compositionality, Mikolov, Sutskever, Chen, Corrado, and Dean, NIPS 2013.
Recurrent Neural Networks for NLP
- Stanford CS 224N slides
RNNs with Attention for Machine Translation
- Stanford CS 224N slides
Convolutional Neural Networks for NLP
- Stanford CS 224N slides
- Character-Aware Neural Language Models
Contextualized Word Embeddings and Pre-training for NLP
- ELMo: Deep contextualized word representations, Peters et al., NAACL 2018.
- Transformer: Attention is all you need, Vaswani et al., NIPS 2017.
  - Hand notes Nov 21 one, Nov 21 two, Nov 26 one, Nov 26 two, and Nov 26 three.
  - The Annotated Transformer and ACL 2018 paper by Alexander Rush.
  - Stanford CS 224N slides
- GPT: Improving Language Understanding by Generative Pre-Training, Radford et al., OpenAI 2018.
  - Open AI Blog first entry on GPT and follow up.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL 2019.
  - Jacob Devlin's presentation.
  - Google AI Blog entry on BERT .
  - ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, Lan et al., 2019.
Energy and Policy Considerations for Deep Learning in NLP, Strubell et al., ACL 2019.
Syntax and Grammars
- Bracketing Guidelines for the Penn Treebank Project, Bies et al., Technical Report, 1995
- Combinatory Categorial Grammar, Mark Steedman and Jason Baldridge, Technical Report, 2003
Syntactic Parsing

Homework Assignments:

Assignment 1 and skeleton code and data.
Assignment 2 and skeleton code and data.
Assignment 3 and Jupyter notebook code.
Assignment 4 and skeleton code and data.
Assignment 5 and skeleton code and data.

Final Project:

Tips for choosing a project topic:
- UT Austin
- Stanford
Project suggestions
Project report guidelines

Online resources: