DSBA/ITCS 6162/8162

Knowledge Discovery in Databases - KDD


Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.


Course Syllabus



Lectures (Class Videos available on LINK)

August 24
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF
[5] Association Rules (Video by L. Powell)

August 31
[1] LERS, PDF
[2] Granular Computing, PDF
[3] Reducts and Discretization, PDF
[4] Reducts(Video by L. Powell)
[5] Discretization(Video by L. Powell)

Sept 7
[1] Sample Problems
[2] Rough Set Exploration System (RSES)

September 14
[1] Mining Incomplete Data PDF
[2] Support Vector Machine PDF
[3] Bratko's ORANGE or WEKA
[4] Sample Problems

September 21
[1] Action Rules and Meta-Actions PDF
[2] Action Rules Extraction Using Action Reducts

September 28
[1] Chase Algorithms PDF
[2] Data Sanitization PDF
[3] Exercise

October 5
[1] Clustering Methods PDF
[2] Clustering(Video)
[3] TV Trees PDF
[4] Clustering - Sample problems with solutions

October 19
[1] Clustering - Sample problems
[2] Sample Problems (Midterm Exam), Part I

October 26
1:00am-2:30pm (Lecture by Zbyszek Ras)
[1] Sample Problems (Midterm Exam), Part II & Project

2:45-3:45pm (Lecture by Sapna Pareek)
[2] Lisp Miner

November 2
Midterm Exam (on CANVAS)

November 9
[1] Solutions to Midterm Exam
[2] Project Questions & Answering Session
[3] Data Sanitization PDF Example

November 16
[1] Evaluation Methods
[2] Distributed Data and Big Data
[3] Sample Problems

November 23 (NO CLASS: Asynchronous online learning)
KDD Application Areas
(PPT presentations & VIDEOS you need to get familiar with)
[1] Health Analytics    Procedure Graph
      VIDEO
[2] Art Analytics (Paintings)
      VIDEO (by L. Powell)

November 30
KDD Application Areas
[3] Business Analytics
[4] Art Analytics (Music)

December 7
[2] Sample Problems (Final Exam)


December 14
Final Exam (on CANVAS); 11:00am - 1:30pm


Project
Project and LISp-Miner
Upload the project report and the dataset you created to Canvas or send them to
Sapna Pareek at [spareek@uncc.edu]
not later than December 6 (Monday), 2021
Project Rubric


Midterm (on Canvas): November 2
Final (on Canvas): December 14, 11:00am - 1:30pm
Points: Midterm - 30 points, Final - 30 points, Project - 40 points

Grades: A [90-100], B [80-89], C [65-79].


Class Location - https://uncc.zoom.us/j/94554407742
Meeting Time: Tuesday, 1:00-3:45pm



Instructor:       Zbigniew W. Ras

Office: Woodward Hall 430C
Telephone: 704-687-8574
e-mail: ras@uncc.edu Office Hours on ZOOM: https://uncc.zoom.us/j/98763508031
Tuesday: 11:30am-12:45pm


GTA I:       Sapna Pareek

Office: Woodward Hall 402 (KDD Lab)
e-mail: spareek@uncc.edu Office Hours on ZOOM: https://uncc.zoom.us/j/3315691724
Monday: 5:00-6:30pm, Thursday: 2:30-4:00pm


GTA II:       Rishab Semlani

Office: Woodward Hall 402 (KDD Lab)
e-mail: rsemlani@uncc.edu Office Hours on ZOOM: https://uncc.zoom.us/j/92404241568?pwd=cjZ5MmE2YTlveGYvQUg4dGN4R1B2QT09
Wednesday: 11:00am-12:30pm, Thursday: 11:00am-12:30pm


[1] Lisp Miner(by Jan Rauch)

[2] Lisp Miner Manual(by Jan Rauch's Student)

[3] Rough Sets

[4] Software for data mining

[5] Repository of large datasets

[6] LERS vs ERID

[7] Extracting Rules from Incomplete Table

[8] Lance & Williams Distance

[9] Sample Problems for Midterm Exam