ITCS/ITIS 6162/8162 (Hybrid)
Data Mining/ Knowledge Discovery in Databases - KDD
Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch,
Vipin Kumar, Addison Wesley.
Course Syllabus
Office Hours (August 21 - November 26, December 2-6)
If you have questions concerning any topic covered in the class, please join me (Center City) or my TAs during their office hours scheduled either in the KDD Lab (Woodward 402) or on ZOOM every week. No office hours on October 14-15 (Winter Break) and November 27-29 (Thanksgiving Break)
Office hours in the KDD Lab or ZOOM sessions are listed below:
GTA 1
Sai Dheeraj Lanka (e-mail: slanka1@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Monday, 4:00-6:00pm; Wednesday, 12:00-2:00pm
- Office Hours on ZOOM
Friday, 12:00-2:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/94488516161?pwd=83lvbek6Jb8CPQtAbbyXCS3rC3Iocv.1
GTA 2
Hardhika Reddy Gundlagutta (e-mail: hgundlag@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Monday, 2:00-4:00pm; Wednesday, 2:00-4:00pm
- Office Hours on ZOOM
Friday, 4-6:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/2120146314
GTA 3
Sathvika Patwari (e-mail: spatwar2@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Thursday, 2:00-4:00pm; Friday, 2:00-4:00pm
- Office Hours on ZOOM
Wednesday, 4:00 - 6:00pm
ZOOM LINK:
https://charlotte-edu.zoom.us/j/94643560044?pwd=rdiELLlndYMTe6PtcBDfmoR5UtrlED.1
Meeting ID - 946 4356 0044; Password - 840190
Zbigniew Ras
- Office Hours in the Center City (every odd week - right after lecture beginning Sept. 3), Room 713.
Tuesday: 14:30-16:00pm
-Office Hours on ZOOM (every even week)
Tuesday: 14:30-16:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/98630214552
Week 1 & 2 (August 20 - City) & (August 27 - asynchronously)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]),
computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction
using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3]
Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5]
Association Rules (Video by L. Powell)
[6] LERS,
PDF
[7]
Granular Computing,
PDF, Video Lecture
[8]
Reducts and Discretization,
PDF
[9]
Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)
Exercises/ Problems to solve - posted on August 23. Solutions
Week 3 & 4 (Sept 3 - City) & (Sept 10 - asynchronously)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]), mining imbalanced data (see [4]), and SVM strategy (see [5]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [6]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Mining Imbalanced Data
[5] Support Vector Machine PDF
[6] Bratko's ORANGE & WEKA
Exercises/ Problems to solve - posted on September 8.
Solutions
Week 5 & 6 (Sept 17 - City) & (Sept 24 - asynchronously)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes
Exercises/ Problems to solve - posted on September 22. Solutions
Week 7 & 8 (Oct 1 - City) & (October 8 - asynchronously)
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3],[4]). Get familiar with problems and their solutions presented in [5],[6]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [7]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Lance-Williams Algorithm(Video)
[5] Clustering - Sample problems with solutions
[6] Clustering - Sample problems
[7] Sample Problems (Midterm Exam)
Homework: Get familiar with solutions for Midterm Exam sample problems. If problem is not solved, try to solve it.
Week 9 & 10 (Oct 22 - City) & (October 29 - asynchronously)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek), Additional Information
Project Rubric (to be used for grading)
Midterm (in City 901): October 22, 11:30am-14:00pm
Week 11 & 12 (Nov 5 - City) & (Nov 12 - asynchronously)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF
Example
[2] Evaluation Methods
[3]
Distributed Data and Big Data
[4]
Sample Problems
Exercises/ Problems to solve - posted on November 11.
Week 15 (Dec 3 - City)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam.
[1] Sample Problems (Final Exam)
FINAL EXAM
December 10 (Tuesday), City 901, 11:30am - 2:30pm
Final Exam Solutions
Project
Upload the project report and the dataset you created to Canvas or email them to
Sai Dheeraj Lanka (e-mail: slanka1@charlotte.edu), Hardhika Reddy Gundlagutta (e-mail: hgundlag@charlotte.edu), and Sathvika Patwari (e-mail: spatwar2@charlotte.edu)
not later than December 6 Friday, 2024
Project Rubric (to be used for grading)
Midterm (in City 901): October 22 (Tuesday), 11:30am-14:00pm
Final (in City 901): December 10 (Tuesday), 11:30 - 2:30 pm
Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].
Instructor:   Zbigniew W. Ras
Office: Woodward Hall 430C, City (Dubois Center) Room 713
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/98630214552
Tuesday (every even week): 2:30-4:00pm
(If nobody shows up by 3:00pm, I will leave the zoom meeting)
GTA 1:
  Sai Dheeraj Lanka
Office: Woodward Hall 402 (KDD Lab)
e-mail: slanka1@charlotte.edu
GTA 2:
  Hardhika Reddy Gundlagutta
Office: Woodward Hall 402 (KDD Lab)
e-mail: hgundlag@charlotte.edu
GTA 3:
  Sathvika Patwari
Office: Woodward Hall 402 (KDD Lab)
e-mail: spatwar2@charlotte.edu
Additional Documents