ITCS/ITIS 6162/8162 (asynchronous)

Knowledge Discovery in Databases - KDD


Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.


Course Syllabus



Class Office Hours

If you have questions concerning any topic covered in PPT/Video Lectures/Sample Problems,
please join me or my TAs during office hours scheduled on ZOOM every week.
All ZOOM sessions are listed below:

Yuehua Duan at https://uncc.zoom.us/j/91258655476
Monday, Wednesday, Friday: 9:00-11:00am

Rishab Semlani at https://uncc.zoom.us/j/95898300120
Tuesday, Wednesday, Thursday: 3:00-5:00pm

Zbigniew Ras at https://uncc.zoom.us/j/92716093575
Tuesday & Thursday (till February 18), 11:30am-1:00pm
February 22, 12:00-1:00pm; February 24 & March 1, 12:30-2:00pm
(If nobody shows up during the first 30 minutes, I will leave the zoom meeting)



Week 1 (January 10-14)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]), computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5] Association Rules (Video by L. Powell)
[6] LERS, PDF
[7] Granular Computing, PDF, Video Lecture
[8] Reducts and Discretization, PDF
[9] Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)

Week 2 (January 17-21)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]) and SVM strategy (see [4]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [5]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Support Vector Machine PDF
[5] Bratko's ORANGE & WEKA

Week 3 (January 24-28)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes

Week 4 (January 31 - February 4)
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3]). Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [6]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Clustering - Sample problems with solutions
[5] Clustering - Sample problems
[6] Sample Problems (Midterm Exam)

Week 5 (February 7-11)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek)

Midterm (on Canvas): February 11 (Friday), 4:00-6:30pm

Week 6 (February 14-18)
Learning objectives: Data sanitization method against chase (see [2]), classifiers evaluation strategies (see [3]), mining distributed data and big data (see [4]). Get familiar with problems and their solutions presented in [5]. If a problem is not entirely solved, finish the solution.
[1] Solutions to Midterm Exam
[2] Data Sanitization PDF Example
[3] Evaluation Methods
[4] Distributed Data and Big Data
[5] Sample Problems

Week 7 (February 21-25)
Learning objectives: Applying KDD methods to fine art evaluation (see [2]) and improving human health (see [1]). Review sample problems presented in [3]. Four of them will be on the final exam.
[1] Health Analytics    Procedure Graph
      VIDEO (by Zbig Ras)
[2] Art Analytics (Paintings)
      VIDEO (by L. Powell)
[3] Sample Problems (Final Exam)


FINAL EXAM on CANVAS

March 1 (Tuesday), 4:00 - 6:30pm


Project
Project and LISp-Miner
Upload the project report and the dataset you created to Canvas or email them to
Rishab Semlani at [rsemlani@uncc.edu] and Yuehua Duan at [yduan2@uncc.edu]
not later than February 27 (Sunday), 2022
Project Rubric (to be used for grading)


Midterm (on Canvas): February 11 (Friday), 4:00-6:30pm
Final (on Canvas): March 1 (Tuesday), 4:00-6:30pm
Points: Midterm - 30 points, Final - 30 points, Project - 40 points

Grades: A [90-100], B [80-89], C [65-79].


Instructor:       Zbigniew W. Ras
Office: Woodward Hall 430C
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: https://uncc.zoom.us/j/92716093575
Tuesday & Thursday, 11:30am-1:00pm
(If nobody shows up by noon, I will leave the zoom meeting)


GTA I:       Yuehua Duan

Office: Woodward Hall 402 (KDD Lab)
e-mail: yduan2@uncc.edu Office Hours on ZOOM: https://uncc.zoom.us/j/91258655476
Monday, Wednesday, Friday: 9:00-11:00am


GTA II:       Rishab Semlani

Office: Woodward Hall 402 (KDD Lab)
e-mail: rsemlani@uncc.edu Office Hours on ZOOM: https://uncc.zoom.us/j/95898300120
Tuesday, Wednesday, Thursday: 3:00-5:00pm


Additional Documents

[1] KDD Software

[2] Lisp Miner (+ Action Rules Discovery Module)(by Jan Rauch)

[3] Lisp Miner Manual(by Jan Rauch's Student)

[4] SCARI: Action Rules Discovery Package

[5] Rough Sets

[6] Repository of large datasets

[7] LERS vs ERID

[8] Extracting Rules from Incomplete Table

[9] Lance & Williams Distance

[10] Sample Problems for Midterm Exam