ITCS 4111/5111: Introduction to Natural Language Processing
Fall 2023

Time and Location: Tue, Thu 2:30 – 3:45pm, HHS 376

Instructor & TAs:   Razvan Bunescu   Sivani Josyula       Manogna Chennuru
Office:   Woodward 210F   Zoom & Burson 239B   Zoom & Burson 239B
Office hours:   Tue, Thu 4:00 – 5:00pm   Tue, Wed 10:00 – 11:00am   Thu, Fri 10:00 – 11:00am
Email:   rbunescu @ charlotte edu   sjosyul2 @ charlotte edu   mchennu2 @ charlotte edu

Recommended Texts (PDF available online):
  • Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2023.
  • Natural Language Processing, by Jacob Eisenstein. 2019.

  • Course description:
    Natural Language Processing (NLP) is a branch of Artificial Intelligence concerned with developing computer systems that can analyze or generate natural language. This course will introduce fundamental linguistic analysis tasks, including tokenization, word representations, text classification, syntactic and semantic parsing, and coreference resolution. Machine learning (ML) based techniques will be introduced, ranging from Naive Bayes and logistic regression to Transformer-based language models, which will be used in a number of NLP applications such as sentiment classification, information extraction, or question answering. Overall, the aim of this course is to equip students with an array techniques and tools that they can use to solve known NLP tasks, as well as new types of NLP problems.

    Students are expected to be comfortable with programming in Python, data structures and algorithms (ITSC 2214), and have basic knowledge of linear algebra (MATH 2164), statistics, and formal languages (regular and context free grammars). Knowledge of machine learning will be very useful, though not strictly necessary. Relevant background material will be made available on this website throughout the course.

    Lecture notes:
    1. Syllabus & Introduction
    2. Python for programming, linear algebra, and visualization
    3. Tokenization: From text to sentences and tokens
    4. Regular expressions
    5. Strengths and Weaknesses of Language Models
    6. Application development using GPT and Llama-2 through the Chat completion API
    7. Text classification using Naive Bayes
    8. Logistic regression
    9. Biases vs. fairness and rationality in NLP models
    10. Manual annotation for NLP
    11. Word meanings; Sparse vs. dense representations of words
    12. N-grams and Neural models for Language Modeling and Sequence Processing
    13. Machine translation, Sequence-to-sequence models and Attention
    14. Transformer: Self-Attention Networks
    15. Language Models: Pretraining and Fine-tuning
    16. Language Models: Prompting, In-context Learning, Chain of Thought, Instruct Tuning, RLHF
    17. Coreference resolution
    18. Syntax, constituency parsing, dependency parsing

    Homework assignments1,2: 1 The code for assignment 7 is based on an assignment from the CS224n course at Stanford on NLP with Deep Learning.

    Final project:
    Background reading materials:
    Supplemental readings:
    Tools and packages: