ITCS/ITIS 6162/8162 (asynchronous)

Knowledge Discovery in Databases - KDD


Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.


Course Syllabus



Office Hours (August 21 - November 21, November 27 - December 6)

If you have questions concerning any topic covered in PPT Presentations/Video Lectures/Sample Problems posted on this website,
please join me or my TAs during our office hours scheduled either on ZOOM or in the KDD Lab (Woodward 402) every week.

All ZOOM sessions and meetings in the KDD Lab are listed below:

GTA 1
Susmitha Dalli (e-mail: sdalli@uncc.edu)
- Office Hours on ZOOM
Tuesday (every week) & Friday (every odd week): 10:00am - 12:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/7532707324
- Office Hours in the KDD Lab (Woodward 402)
Friday (every even week): 10:00am - 12:00pm
No office hours on November 22-24 (Thanksgiving Break)

GTA 2
Varsha Malladi (e-mail: smallad4@uncc.edu)
- Office Hours on ZOOM
Wednesday (every week): 1:00pm - 3:00pm; Thursday (every odd week): 10:00am - 12:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/5525061807
- Office Hours in the KDD Lab (Woodward 402)
Thursday (every even week): 10:00am - 12:00pm
No office hours on November 22-24 (Thanksgiving Break)

GTA 3
Ujwala Mallela (e-mail: umallela@uncc.edu)
- Office Hours on ZOOM
Monday (every week) & Wednesday (every odd week): 10:00am - 12:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/7847498260
- Office Hours in the KDD Lab (Woodward 402)
Wednesday (every even week): 10:00am - 12:00pm
No office hours on November 22-24 (Thanksgiving Break)

Zbigniew Ras
- Office Hours ZOOM Link
https://charlotte-edu.zoom.us/j/92579556226
Thursday: 3:00-5:00pm
(If no one shows up by 3:30pm, I will leave the zoom meeting)
No office hours on November 23 (Thanksgiving Break)



Week 1 (odd) & 2 (August 21 - September 3)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]), computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5] Association Rules (Video by L. Powell)
[6] LERS, PDF
[7] Granular Computing, PDF, Video Lecture
[8] Reducts and Discretization, PDF
[9] Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)

Week 3 (odd) & 4 (September 4 - 17)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]) and SVM strategy (see [4]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [5]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Support Vector Machine PDF
[5] Bratko's ORANGE & WEKA

Week 5 (odd) & 6 (September 18 - October 1)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes

Week 7 & 8 (October 2 - 15)
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3]). Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [6]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Clustering - Sample problems with solutions
[5] Clustering - Sample problems
[6] Sample Problems (Midterm Exam)

Week 9 & 10 (October 16 - 29)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek)

Midterm (on Canvas): October 20 (Friday), 4:00-6:30pm

Week 11 & 12 (October 30 - November 12)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF Example
[2] Evaluation Methods
[3] Distributed Data and Big Data
[4] Sample Problems

Week 13 & 14 (November 13 - 29)
Learning objectives: Applying KDD methods to fine art evaluation (see [1]) and to improve human health (see [2]).
[1] Art Analytics (Paintings)
      VIDEO (by L. Powell)
[2] Health Analytics    Procedure Graph
      VIDEO (by Zbig Ras)

Week 15 (November 30 - December 6)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam.
[1] Sample Problems (Final Exam)


FINAL EXAM on CANVAS

December 8 (Friday), 4:00 - 6:30pm


Final Exam Solutions


Project
Project and LISp-Miner
Upload the project report and the dataset you created to Canvas or email them to
Susmitha Dalli [sdalli@uncc.edu], Varsha Malladi [smallad4@uncc.edu], and Ujwala Mallela [umallela@uncc.edu]
not later than December 11 (Monday), 2023
Project Rubric (to be used for grading)


Midterm (on Canvas): October 20 (Friday), 4:00-6:30pm
Final (on Canvas): December 8 (Friday), 4:00 - 6:30pm

Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].


Instructor:       Zbigniew W. Ras
Office: Woodward Hall 430C
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/92579556226
Thursday: 3:00-5:00pm
(If nobody shows up by 3:30pm, I will leave the zoom meeting)


GTA 1:       Susmitha Dalli

Office: Woodward Hall 402 (KDD Lab)
e-mail: sdalli@uncc.edu


GTA 2:       Varsha Malladi

Office: Woodward Hall 402 (KDD Lab)
e-mail: smallad4@uncc.edu


GTA 3:       Ujwala Mallela

Office: Woodward Hall 402 (KDD Lab)
e-mail: umallela@uncc.edu


Additional Documents

[1] KDD Software

[2] Lisp Miner (+ Action Rules Discovery Module)(by Jan Rauch)

[3] Lisp Miner Manual(by Jan Rauch's Student)

[4] SCARI: Action Rules Discovery Package

[5] Rough Sets

[6] Repository of large datasets

[7] LERS vs ERID

[8] Extracting Rules from Incomplete Table

[9] Lance & Williams Distance

[10] Sample Problems for Midterm Exam