ITCS 4101: Introduction to Natural Language Processing
Fall 2025
Time and Location: Tue, Thu 1:00 – 2:15pm, CHHS 380
Instructor & TA: |
|
Razvan Bunescu |
|
Youssef Ait Alama |
Office: |
|
Woodward 410G |
|
Woodward 412 |
Office hours: |
|
Tue, Thu 2:30 – 3:30pm |
|
Mon, Wed 12:00 – 1:00pm |
Email: |
|
rbunescu @ charlotte edu |
|
yaitalam @ charlotte edu |
Textbook (PDF available online):
Speech and Language Processing (3rd edition draft), by Daniel Juraksfy and James E. Martin. 2025.
Course description:
Natural Language Processing (NLP) is an area of Artificial Intelligence whose focus is on the development of computer systems that process or generate natural language. This course will first introduce fundamental linguistic analysis tasks, including tokenization, syntactic parsing, semantic parsing, and coreference resolution. We will then study vector based representations of text, ranging from bag-of-words and TF-IDF to neural word and text embeddings. The course will survey machine learning models and techniques underlying modern NLP, including attention and Transformer-based language models, which will be used in a number of NLP applications such as sentiment classification, information extraction, or question answering. In parallel, the course will introduce standard frameworks for developing workflows where LLM-based agents connect with tools and communicate with other agents. Overall, the aim of this course is to equip students with an array techniques and tools that they can use to solve known NLP tasks, as well as new types of NLP problems.
Prerequisites:
Introduction to Machine Learning (ITCS 3156). Students are expected to be comfortable with programming in Python, data structures and algorithms (ITSC 2214), and basic machine learning techniques. Relevant background material will be made available on this website throughout the course.
Lecture notes:
- Syllabus & Introduction
- Python for programming, linear algebra, and visualization
- Tokenization: From text to sentences and tokens
- Chapter 2, sections on words, morphemes, rule-based tokenization, BPE, and corpora.
- Tokenization examples: PDF and notebook.
- Notes from lecture on Sep 2.
- Regular expressions
- Text classification using Logistic Regression
- Chapter 4, sections on sentiment analysis and evaluation measures.
- Chapter 5 on logistic regression from the textbook.
- LLMs: use scenarios, strengths and weaknesses
- LLMs: application development with the chat completion API
- Notebooks with examples using GPT, Llama, and Gemini.
- LLMs: connecting applications with tools and external resources through MCP
- LLMs: developing and deploying multi-agent systems
- Word meanings; Sparse vs. dense representations of words
- N-grams and Neural models for Language Modeling and Sequence Processing
- Machine translation, Sequence-to-sequence models and Attention
- Transformers and Pretrained Language Models
- Chapter 9 in J & M on Deep Learning Architectures for Sequence Processing.
- Chapter 10 in J & M on Transformers and Pretrained Language Models.
- Tuning LLMs with SFT, DPO, and online RL
- Biases vs. fairness and rationality in NLP models
Homework assignments1:
Background reading materials:
- Python programming:
- Probability and statistics:
- Linear Algebra:
- Calculus:
Supplemental readings:
- Accurate methods for the statistics of surprise and coincidence, Ted Dunning, Computational Linguistics, 1993.
- Attention is all you need, Vaswani et al., NIPS 2017.
- Durably reducing conspiracy beliefs through dialogues with AI, Costello et al., Science, September 2024.
- AI can help humans find common ground in democratic deliberation, Tessler et al., Science, October 2024.
Tools and packages:
- Natural language processing:
- Machine learning: