+
ITCS 6162 / ITCS 8162
DSBA/ITIS/ITCS 6162/8162 (asynchronous)
Knowledge Discovery in Databases - KDD
Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch,
Vipin Kumar, Addison Wesley.
Course Syllabus
Class Office Hours (from January 10 till May 2)
If you have questions concerning any topic covered in PPT Presentations/Video Lectures/Sample Problems posted on this website,
please join me or my TAs during our office hours scheduled on ZOOM every week.
All ZOOM sessions are listed below:
Aileen Benedict
Office Hours ZOOM Link
https://charlotte-edu.zoom.us/j/5965926700
Wednesday & Thursday: 11:00am - 1:00pm
Nikhita Somanchi
Office Hours ZOOM Link
https://charlotte-edu.zoom.us/j/95591109081
Monday & Wednesday: 2:00-4:00pm
Zbigniew Ras
Office Hours ZOOM Link
https://charlotte-edu.zoom.us/j/93349737999
Tuesday: 3:00-5:00pm
(If no one shows up by 3:30pm, I will leave the zoom meeting)
Week 1 & 2 (January 9 - 20)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]),
computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction
using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3]
Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5]
Association Rules (Video by L. Powell)
[6] LERS,
PDF
[7]
Granular Computing,
PDF, Video Lecture
[8]
Reducts and Discretization,
PDF
[9]
Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)
Week 3 & 4 (January 23 - February 3)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]) and SVM strategy (see [4]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [5]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Support Vector Machine PDF
[5] Bratko's ORANGE & WEKA
Week 5 & 6 (February 6 - 17)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes
Week 7 & 8 (February 20 - 24; March 6 - 10)
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3]). Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [6]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Clustering - Sample problems with solutions
[5] Clustering - Sample problems
[6] Sample Problems (Midterm Exam)
Week 9 & 10 (March 13 - 24)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek)
Midterm (on Canvas): March 24 (Friday), 4:00-6:30pm
Week 11 & 12 (March 27 - April 7)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF
Example
[2] Evaluation Methods
[3]
Distributed Data and Big Data
[4]
Sample Problems
Week 13 & 14 (April 10 - 21)
Learning objectives: Applying KDD methods to fine art evaluation (see [1]) and to improve human health (see [2]).
[1] Art Analytics (Paintings)
  VIDEO (by L. Powell)
[2] Health Analytics   Procedure Graph
  VIDEO (by Zbig Ras)
Week 15 (April 24 - 28)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam.
[1] Sample Problems (Final Exam)
FINAL EXAM on CANVAS
May 5 (Friday), 4:00 - 6:30pm
Final Exam Solutions
Project
Project and LISp-Miner
Upload the project report and the dataset you created to Canvas or email them to
Aileen Benedict at [abenedi3@uncc.edu] and to Nikhita Somanchi at [nsomanch@uncc.edu]
not later than May 8 (Monday), 2023
Project Rubric (to be used for grading)
Midterm (on Canvas): March 24 (Friday), 4:00-6:30pm
Final (on Canvas): May 5 (Friday), 4:00 - 6:30pm
Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].
Instructor:   Zbigniew W. Ras
Office: Woodward Hall 430C
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/93349737999
Tuesday: 3:00-5:00pm
(If nobody shows up by 3:30pm, I will leave the zoom meeting)
GTA I:
  Aileen Benedict
Office: Woodward Hall 402 (KDD Lab)
e-mail: abenedi3@uncc.edu
Office Hours on ZOOM:
https://charlotte-edu.zoom.us/j/5965926700
When: Wednesday, Thursday: 11:00am - 1:00pm
GTA II:
  Nikhita Somanchi
Office: Woodward Hall 402 (KDD Lab)
e-mail: nsomanch@uncc.edu
Office Hours on ZOOM: https://charlotte-edu.zoom.us/j/95591109081
When: Monday, Wednesday: 2:00-4:00pm
Additional Documents