ITCS 4111/5111: Introduction to Natural Language Processing
Spring 2022
Time and Location: Tue, Thu 11:30 – 12:45pm, Atkins 126
Instructor: Razvan Bunescu
Office: Woodward 210F
Office Hours: Tue, Thu 4:00pm – 5:00pm, or by email appointment
Email: razvan.bunescu @ uncc edu
Teaching Assistant: Joshua Melton
Office: Zoom
Office Hours: Mon, Wed 11:00am – 12:00pm, or by email appointment
Email: jmelto30 @ uncc edu
Recommended Texts (PDF available online):
Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2021.
Natural Language Processing, by Jacob Eisenstein. 2019.
Course description:
Natural Language Processing (NLP) is a branch of Artificial Intelligence concerned with developing computer systems that can process or generate natural language. This course will introduce fundamental tasks in NLP, including tokenization, word representations, text classification, syntactic and semantic parsing, and coreference resolution. Machine learning (ML) based techniques will be used in a number of NLP applications such as sentiment classification, information extraction, and named entity linking. Overall, the aim of this course is to equip students with an array of tools and techniques that they can use to solve known NLP tasks, as well as new types of NLP problems.
Prerequisites:
Students are expected to be comfortable with programming in Python, data structures and algorithms (ITSC 2214), and have basic knowledge of linear algebra (MATH 2164), statistics, and formal languages (regular and context free grammars). Knowledge of machine learning will be very useful, though not strictly necessary. Relevant background material will be made available on this website throughout the course.
Lecture notes:
- Syllabus & Introduction
- Python for programming, linear algebra, and visualization
- Tokenization: From text to sentences and tokens
- Regular expressions
- Text classification using Naive Bayes
- Logistic regression
- Hand notes from lectures on Feb 15 and 17: one, two, three, four, five, six, and GD example.
- Hand notes from lecture on Feb 22: one, two, three, four, five, and six
- Slides 1 to 13 from CS 4156 lecture on Intro to ML
- Slides 1 to 24 (LR) and slide 27 (LR + L2 regularization) from CS 4156 lecture on Logistic Regression
- Slides 1 to 21 from CS 4156 lecture on Gradient Descent
- Manual annotation for NLP
- brat rapid annotation tool
- INCEpTION semantic annotation tool
- doccano text annotation tool for humans
- Word meanings; Sparse vs. dense representations of words
- Hand notes from lecture on Mar 15 on ROC and PR curves: one, two, and three.
- Hand notes from lecture on Mar 22 on Word2vec: one and two.
- Hand notes from lecture on Mar 24: one and two.
- Slides from CS 6840 lecture on Word Embeddings.
- Sequence labeling for POS tagging and NE recognition
- Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs)
- N-grams and Neural models for Language Modeling and Sequence Processing
- Machine translation, Sequence-to-sequence models and Attention
- Hand notes from lecture on Apr 19 on (R)NNs one and two.
- Transformer: Self-Attention Networks
- Energy and Policy Considerations for Deep Learning in NLP, Strubell et al., ACL 2019.
- Coreference resolution
- Syntax, generative grammars, and syntactic parsing
- Hand notes from lecture on May 3: one and two.
Readings on the current state-of-the-art and challenges in AI:
Homework assignments1,2:
1
The code for assignment 6 is based on an assignment from the CS224n course at Stanford on NLP with Deep Learning.
2
The code for assignment 7 is based on an assignment from Greg Durrett's CS388 course at UT Austin on Natural Language Processing.
Final Project:
Background reading materials:
- Probability and statistics:
- Linear Algebra:
Tools and packages:
- Natural language processing:
- Machine learning: