ITCS/ITIS 6162/8162 (Hybrid)
Data Mining/ Knowledge Discovery in Databases - KDD
Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch,
Vipin Kumar, Addison Wesley.
Course Syllabus
Office Hours (August 20 - December 5)
If you have questions concerning any topic covered in the class, please join us at the office hours scheduled either in Woodward Hall or on ZOOM every week. No office hours on September 1, October 9-10 (Winter Break), and November 26-28 (Thanksgiving Break)
Office hours are posted below:
GTA 1
Samsritha Chowdary Mandhadapu (e-mail: smandhad@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Monday, 9:00–11:00am; Wednesday, 10:00am-12:00pm
- Office Hours on ZOOM
Friday, 2:00-4:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/99459499567?pwd=1yoYdLGrxLkY4wQKzCbULZdigRxf3Z.1
GTA 2
Vamsi Suhas Sadhu (e-mail: vsadhu1@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Tuesday, 12:30-2:30pm; Thursday, 2:00-4:00pm
- Office Hours on ZOOM
Friday, 10:00am-12:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/91784463209?pwd=OK7hkbKWS6x7lMb4XvwZ40EfPyrrTW.1
Zbigniew Ras
Office Hours on Monday, 12:00-2:00pm
- In Woodward 430C (Sept 8, 22, 29, Oct 13, Nov 10, 24 - right before lectures).
- On ZOOM (August 25, Sept 15, Oct 6, 20, Nov 3, 17, Dec 1)
ZOOM LINK: https://charlotte-edu.zoom.us/j/96952435470
Week 1 & 2 (August 18 - Cameron 101) & (August 25 - asynchronously)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]),
computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction
using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3]
Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5]
Association Rules (Video by L. Powell)
[6] LERS,
PDF
[7]
Granular Computing,
PDF, Video Lecture
[8]
Reducts and Discretization,
PDF
[9]
Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)
Exercises/ Problems to solve. Solutions
Week 3 & 4 (Sept 8 - Cameron 101) & (Sept 15 - asynchronously)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]), mining imbalanced data (see [4]), and SVM strategy (see [5]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [6]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Mining Imbalanced Data
[5] Support Vector Machine PDF
[6] Bratko's ORANGE & WEKA
Exercises/ Problems to solve.
Solutions
Week 5 & 6 (Sept 22 - Cameron 101) & (Sept 29 - Cameron 101)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes
Exercises/ Problems to solve.
Solutions
Week 7 & 8 (October 6 - asynchronously) & (October 13 - Cameron 101)
O
Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3],[4]). Get familiar with problems and their solutions presented in [5],[6]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [7]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Lance-Williams Algorithm(Video)
[5] Clustering - Sample problems with solutions
[6] Clustering - Sample problems
[7] Sample Problems (Midterm Exam)
Homework: Get familiar with solutions for Midterm Exam sample problems. If problem is not solved, try to solve it.
Week 9 & 10 (Oct 20 - asynchronously) & (October 27 - Cameron 101)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek), Additional Information
Project Rubric (to be used for grading)
Midterm (in Cameron 101): October 27, 2:30-5:15pm
Week 11 & 12 (Nov 3 - asynchronously) & (Nov 10 - Cameron 101)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF
Example
[2] Evaluation Methods
[3]
Distributed Data and Big Data
[4]
Sample Problems
Exercises/ Problems to solve.
Week 15 (Dec 1 - asynchronously)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam.
[1] Sample Problems (Final Exam)
FINAL EXAM
December 8 (Monday), Cameron 101, 2:00 - 4:30pm
Project
Upload the project report and the dataset you created to Canvas or email them to
Samsritha Chowdary Mandhadapu (e-mail: smandhad@charlotte.edu) and Vamsi Suhas Sadhu (e-mail: vsadhu1@charlotte.edu)
not later than November 28 (Friday).
Project Rubric (to be used for grading)
Midterm (in Cameron 101): October 27 (Monday), 2:30-5:15pm
Final (in Cameron 101): December 8 (Monday), 2:00-4:30pm
Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].
Instructor:   Zbigniew W. Ras
Office Hours in Woodward Hall 430C
Sept 8, 22, 29, Oct 13, Nov 10, 24 (12:00-2:00pm)
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/96952435470
August 25, Sept 15, Oct 6, 20, Nov 3, 17, Dec 1 (12:00-2:00pm)
If nobody shows up by 12:30pm, I will leave the zoom meeting.
GTA 1:
  Samsritha Chowdary Mandhadapu
Office: Woodward Hall 402 (KDD Lab)
e-mail: smandhad@charlotte.edu
GTA 2:
  Vamsi Suhas Sadhu
Office: Woodward Hall 402 (KDD Lab)
e-mail: vsadhu1@charlotte.edu