ITCS 4111/5111: Introduction to Natural Language Processing
Spring 2022

Time and Location: Tue, Thu 11:30 – 12:45pm, Atkins 126

Instructor: Razvan Bunescu
Office: Woodward 210F
Office Hours: Tue, Thu 4:00pm – 5:00pm, or by email appointment
Email: razvan.bunescu @ uncc edu

Teaching Assistant: Joshua Melton
Office: Zoom
Office Hours: Mon, Wed 11:00am – 12:00pm, or by email appointment
Email: jmelto30 @ uncc edu

Recommended Texts (PDF available online):
  • Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2021.
  • Natural Language Processing, by Jacob Eisenstein. 2019.

  • Course description:
    Natural Language Processing (NLP) is a branch of Artificial Intelligence concerned with developing computer systems that can process or generate natural language. This course will introduce fundamental tasks in NLP, including tokenization, word representations, text classification, syntactic and semantic parsing, and coreference resolution. Machine learning (ML) based techniques will be used in a number of NLP applications such as sentiment classification, information extraction, and named entity linking. Overall, the aim of this course is to equip students with an array of tools and techniques that they can use to solve known NLP tasks, as well as new types of NLP problems.

    Students are expected to be comfortable with programming in Python, data structures and algorithms (ITSC 2214), and have basic knowledge of linear algebra (MATH 2164), statistics, and formal languages (regular and context free grammars). Knowledge of machine learning will be very useful, though not strictly necessary. Relevant background material will be made available on this website throughout the course.

    Lecture notes:
    1. Syllabus & Introduction
    2. Python for programming, linear algebra, and visualization
    3. Tokenization: From text to sentences and tokens
    4. Regular expressions
    5. Text classification using Naive Bayes
    6. Logistic regression
    7. Manual annotation for NLP
    8. Word meanings; Sparse vs. dense representations of words
    9. Sequence labeling for POS tagging and NE recognition
    10. Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs)
    11. N-grams and Neural models for Language Modeling and Sequence Processing
    12. Machine translation, Sequence-to-sequence models and Attention
    13. Transformer: Self-Attention Networks
    14. Energy and Policy Considerations for Deep Learning in NLP, Strubell et al., ACL 2019.
    15. Coreference resolution
    16. Syntax, generative grammars, and syntactic parsing

    Readings on the current state-of-the-art and challenges in AI:
    Homework assignments1,2: 1 The code for assignment 6 is based on an assignment from the CS224n course at Stanford on NLP with Deep Learning.
    2 The code for assignment 7 is based on an assignment from Greg Durrett's CS388 course at UT Austin on Natural Language Processing.

    Final Project:
    Background reading materials:
    Tools and packages: