ITCS/ITIS 6162/8162 (Hybrid)

Data Mining/ Knowledge Discovery in Databases - KDD


Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.


Course Syllabus



Office Hours (August 21 - November 26, December 2-6)

If you have questions concerning any topic covered in the class, please join me (Center City) or my TAs during their office hours scheduled either in the KDD Lab (Woodward 402) or on ZOOM every week. No office hours on October 14-15 (Winter Break) and November 27-29 (Thanksgiving Break)

Office hours in the KDD Lab or ZOOM sessions are listed below:

GTA 1
Sai Dheeraj Lanka (e-mail: slanka1@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Monday, 4:00-6:00pm; Wednesday, 12:00-2:00pm
- Office Hours on ZOOM
Friday, 12:00-2:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/94488516161?pwd=83lvbek6Jb8CPQtAbbyXCS3rC3Iocv.1

GTA 2
Hardhika Reddy Gundlagutta (e-mail: hgundlag@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Monday, 2:00-4:00pm; Wednesday, 2:00-4:00pm
- Office Hours on ZOOM
Friday, 4-6:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/2120146314

GTA 3
Sathvika Patwari (e-mail: spatwar2@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Thursday, 2:00-4:00pm; Friday, 2:00-4:00pm
- Office Hours on ZOOM
Wednesday, 4:00 - 6:00pm
ZOOM LINK:
https://charlotte-edu.zoom.us/j/94643560044?pwd=rdiELLlndYMTe6PtcBDfmoR5UtrlED.1
Meeting ID - 946 4356 0044; Password - 840190

Zbigniew Ras
- Office Hours in the Center City (every odd week - right after lecture beginning Sept. 3), Room 713.
Tuesday: 14:30-16:00pm
-Office Hours on ZOOM (every even week)
Tuesday: 14:30-16:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/98630214552



Week 1 & 2 (August 20 - City) & (August 27 - asynchronously)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]), computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5] Association Rules (Video by L. Powell)
[6] LERS, PDF
[7] Granular Computing, PDF, Video Lecture
[8] Reducts and Discretization, PDF
[9] Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)
Exercises/ Problems to solve - posted on August 23. Solutions

Week 3 & 4 (Sept 3 - City) & (Sept 10 - asynchronously)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]), mining imbalanced data (see [4]), and SVM strategy (see [5]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [6]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Mining Imbalanced Data
[5] Support Vector Machine PDF
[6] Bratko's ORANGE & WEKA
Exercises/ Problems to solve - posted on September 8. Solutions

Week 5 & 6 (Sept 17 - City) & (Sept 24 - asynchronously)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes
Exercises/ Problems to solve - posted on September 22. Solutions

Week 7 & 8 (Oct 1 - City) & (October 8 - asynchronously)
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3],[4]). Get familiar with problems and their solutions presented in [5],[6]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [7]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Lance-Williams Algorithm(Video)
[5] Clustering - Sample problems with solutions
[6] Clustering - Sample problems
[7] Sample Problems (Midterm Exam)
Homework: Get familiar with solutions for Midterm Exam sample problems. If problem is not solved, try to solve it.

Week 9 & 10 (Oct 22 - City) & (October 29 - asynchronously)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek), Additional Information
Project Rubric (to be used for grading)

Midterm (in City 901): October 22, 11:30am-14:00pm

Week 11 & 12 (Nov 5 - City) & (Nov 12 - asynchronously)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF Example
[2] Evaluation Methods
[3] Distributed Data and Big Data
[4] Sample Problems
Exercises/ Problems to solve - posted on November 11.

Week 13 & 14 (Nov 19 - City) & (Nov 26 - asynchronously)
Learning objectives: Applying KDD methods to fine art evaluation (see [1]), to improve human health (see [2]), to business (see [3])
[1] Art Analytics (Paintings)
[2] Health Analytics
      VIDEO (by Zbyszek Ras)
[3] Business Analytics
[4] Action Rules Discovery by Vertical Data Partitioning
Exercises/ Problems to solve - posted on November 25.

Week 15 (Dec 3 - City)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam.
[1] Sample Problems (Final Exam)


FINAL EXAM

December 10 (Tuesday), City 901, 11:30am - 2:30pm


Final Exam Solutions


Project
Upload the project report and the dataset you created to Canvas or email them to
Sai Dheeraj Lanka (e-mail: slanka1@charlotte.edu), Hardhika Reddy Gundlagutta (e-mail: hgundlag@charlotte.edu), and Sathvika Patwari (e-mail: spatwar2@charlotte.edu)
not later than December 6 Friday, 2024
Project Rubric (to be used for grading)


Midterm (in City 901): October 22 (Tuesday), 11:30am-14:00pm
Final (in City 901): December 10 (Tuesday), 11:30 - 2:30 pm

Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].


Instructor:       Zbigniew W. Ras
Office: Woodward Hall 430C, City (Dubois Center) Room 713
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/98630214552
Tuesday (every even week): 2:30-4:00pm
(If nobody shows up by 3:00pm, I will leave the zoom meeting)


GTA 1:       Sai Dheeraj Lanka

Office: Woodward Hall 402 (KDD Lab)
e-mail: slanka1@charlotte.edu


GTA 2:       Hardhika Reddy Gundlagutta

Office: Woodward Hall 402 (KDD Lab)
e-mail: hgundlag@charlotte.edu


GTA 3:       Sathvika Patwari

Office: Woodward Hall 402 (KDD Lab)
e-mail: spatwar2@charlotte.edu


Additional Documents

[1] KDD Software

[2] Statistics for Data Science & Machine Learning

[3] Lisp Miner (+ Action Rules Discovery Module)(by Jan Rauch)

[4] Lisp Miner Manual(by Jan Rauch's Student)

[5] SCARI: Action Rules Discovery Package

[6] Repository of large datasets

[7] LERS vs ERID

[8] Extracting Rules from Incomplete Table

[9] Lance & Williams Distance