ITCS 4101: Introduction to Natural Language Processing
Fall 2024


Time and Location: Tue, Thu 1:00 – 2:15pm, Woodward 140

Instructor & TA:   Razvan Bunescu     Justin Smith
Office:   Woodward 410G   Burson 239B
Office hours:   Tue, Thu 5:30 – 6:30pm   Wed, Fri 12:00 – 1:00pm
Email:   razvan.bunescu @ charlotte edu   jsmit840 @ charlotte edu

Recommended Texts (PDF available online):
  • Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2024.
  • Natural Language Processing, by Jacob Eisenstein. 2019.

  • Course description:
    Natural Language Processing (NLP) is a subdomain of Artificial Intelligence whose focus is on the development and study of computer systems that process or generate natural language. This course will introduce fundamental linguistic analysis tasks, including tokenization, word representations, syntactic parsing, semantic parsing, and coreference resolution. Machine learning (ML) based techniques will be introduced, ranging from Naive Bayes and logistic regression to Transformer-based language models, which will be used in a number of NLP applications such as sentiment classification, information extraction, or question answering. Overall, the aim of this course is to equip students with an array techniques and tools that they can use to solve known NLP tasks, as well as new types of NLP problems.

    Prerequisites:
    Students are expected to be comfortable with programming in Python, data structures and algorithms (ITSC 2214), and have basic knowledge of linear algebra (MATH 2164) and statistics (STAT 2122). Knowledge of machine learning will be very useful, though not strictly necessary. Relevant background material will be made available on this website throughout the course.

    Lecture notes:
    1. Syllabus & Introduction
    2. Python for programming, linear algebra, and visualization
    3. Tokenization: From text to sentences and tokens
    4. Regular expressions
    5. Strengths and Weaknesses of Language Models
    6. Application development using LLMs through the Chat completion API
    7. Building LLM-powered applications with LangChain and AutoGen
    8. Text classification using Naive Bayes
    9. Logistic regression
    10. Biases vs. fairness and rationality in NLP models
    11. Manual annotation for NLP
    12. Word meanings; Sparse vs. dense representations of words
    13. N-grams and Neural models for Language Modeling and Sequence Processing
    14. Machine translation, Sequence-to-sequence models and Attention
    15. Transformers and Pretrained Language Models
    16. Large Language Models: Pretraining, Fine-tuning, In-context Learning, Chain of Thought, Instruct Tuning, RLHF

    Homework assignments1: 1 The code for assignment 8 is based on an assignment from the CS224n course at Stanford on NLP with Deep Learning.

    Background reading materials:
    Supplemental readings:
    Tools and packages: