ITCS/ITIS 6162/8162 (Hybrid)

Data Mining/ Knowledge Discovery in Databases - KDD

Location: Dubois Center 901
Time: Tuesdays, 1:00-3:45pm (10 minutes break: 2:15)


Prerequisites: ITCS6160, full graduate standing or content of the department.

Textbook (not required):

"Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.

Student Learning Outcomes:

(1) Understanding data mining concepts
(2) Applying various algorithms for data preparation and analysis
(3) Designing data mining applications and evaluating model performance
(4) Students will also learn to apply these skills to solve real-world problems

Course Syllabus

Attendance Policy:

Attendance is mandatory. If you miss n classes, where n ≥ 2, n-2 points will be subtracted from your total number of points.
Excuse absence must be accompanied by official documentation that clearly states that you were physically unable to make the class.


Office Hours (January 19 - April 24)

If you have questions concerning any topic covered in the class, please join us at the office hours scheduled either in Center City, Room 713 or on ZOOM every week. No office hours on March 9-13 (Spring Break)

Office hours are posted below:

..................................
GTA 1
Samsritha Chowdary Mandhadapu (e-mail: smandhad@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Tuesday: 9:00-11:00am
Thursday: 9:00-11:00am

- Office Hours on ZOOM
Friday: 10:00-12:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/99459499567?pwd=1yoYdLGrxLkY4wQKzCbULZdigRxf3Z.1

..................................
GTA 2
Akhila Bollampally (e-mail: abollamp@charlotte.edu)
- Office Hours in the KDD Lab (Woodward 402)
Wednesday: 9:00-11:00am
Thursday: 11am-1:00pm

- Office Hours on ZOOM
Friday: 12:00-2:00 pm
ZOOM LINK: https://charlotte-edu.zoom.us/j/96398362835

..................................
Zbigniew Ras
Office Hours on Tuesday, 4:00-6:00pm
- In Center City, Room 713: Jan 27, Febr 10, 24, March 17, 31, April 14 - right after lectures.
- On ZOOM: Jan 20, Feb 3, 17, March 3, 24, April 7, 21
ZOOM LINK: https://charlotte-edu.zoom.us/j/95616993967
If nobody shows up by 4:30pm, I will leave the zoom meeting.



Week 1 & 2 (January 13 - Center City 901) & (January 20 - asynchronously)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]), computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5] Association Rules (Video by L. Powell)
[6] LERS, PDF
[7] Granular Computing, PDF, Video Lecture
[8] Reducts and Discretization, PDF
[9] Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)
Exercises/ Problems to solve. Solutions

Week 3 & 4 (January 27 - Center City 901) & (February 3 - asynchronously)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]), mining imbalanced data (see [4]), and SVM strategy (see [5]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [6]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Mining Imbalanced Data
[5] Support Vector Machine PDF
[6] Bratko's ORANGE , Orange Sample & WEKA
Exercises/ Problems to solve. Solutions

Week 5 & 6 (February 10 - Center City 901) & (February 17 - asynchronously)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Strategy Chase for revealing hidden values in datasets [3]. Get familiar with problems and their solutions presented in [4],[5]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Chase Algorithms PDF, Video Lecture
[4] Sample Problems
[5] Query Answering & New Attributes
Exercises/ Problems to solve. Solutions

Week 7 & 8 (February 24 - Center City 901) & (March 3 - asynchronously)
O Learning objectives: Agglomerative and divisive clustering strategies (see [1],[2],[3],[4]). Get familiar with problems and their solutions presented in [5],[6]. If a problem is not entirely solved, complete the solution. Review sample problems presented in [7]. Four of them will be on the midterm exam.
[1] Clustering Methods PDF, Video Lecture
[2] Clustering(Video)
[3] TV Trees PDF
[4] Lance-Williams Algorithm(Video)
[5] Clustering - Sample problems with solutions
[6] Clustering - Sample problems
[7] Sample Problems (Midterm Exam)
Homework: Get familiar with solutions for Midterm Exam sample problems. If problem is not solved, try to solve it.

Week 9 & 10 (March 17 - Center City 901) & (March 24 - asynchronously)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn to complete the project. MIDTERM EXAM
[1]Project
[2]Lisp Miner, Video Lecture (by Vamsi Suhas Sadhu), Additional Information
Project Rubric (to be used for grading)

Midterm (in Center City 901): March 17, 1:00-3:30pm

Week 11 & 12 (March 31 - Center City 901) & (April 7 - asynchronously)
Learning objectives: Classifiers evaluation strategies (see [1]), Mining big data (see [2]), mining distributed data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Evaluation Methods
[2] Mining Big Data
[3] Mining Distributed Data
[4] Sample Problems
Exercises/ Problems to solve.

Week 13 & 14 (April 14 - Center City 901) & (April 21 - asynchronously)
Learning objectives: Review sample problems presented in [1] (Four of them will be on the final exam). KDD Applications to fine art (see [2]), healthcare (see [3]), and business (see [4])
[1] Sample Problems (Final Exam)

[2] Art Analytics (Paintings)
[3] Health Analytics
      VIDEO (by Zbyszek Ras)
[4] Business Analytics

FINAL EXAM

April 28 (Tuesday), Center City 901, ???????


Project
Upload the project report and the dataset you created to Canvas or email them to
Samsritha Chowdary Mandhadapu (e-mail: smandhad@charlotte.edu) and Akhila Bollampally (e-mail: abollamp@charlotte.edu)
not later than April 21 (Tuesday).
Project Rubric (to be used for grading)


Midterm (in Center City 901): March 17 (Tuesday), 1:00-3:30pm
Final (in Center City 901): April 28 (Tuesday), ???????

Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Attendance Policy: If you miss n classes, where n ≥ 2, n-2 points will be subtracted from your total number of points.

Grades: A [90-100], B [80-89], C [65-79].


Instructor:       Zbigniew W. Ras

Office: Woodward Hall 430C
e-mail: ras@charlotte.edu


GTA 1:       Samsritha Chowdary Mandhadapu

Office: Woodward Hall 402 (KDD Lab)
e-mail: smandhad@charlotte.edu


GTA 2:       Akhila Bollampally

Office: Woodward Hall 402 (KDD Lab)
e-mail: abollamp@charlotte.edu


[1] KDD Software

[2] Statistics for Data Science & Machine Learning

[3] Lisp Miner (+ Action Rules Discovery Module)(by Jan Rauch)

[4] Lisp Miner Manual(by Jan Rauch's Student)

[5] SCARI: Action Rules Discovery Package

[6] Repository of large datasets

[7] LERS vs ERID

[8] Extracting Rules from Incomplete Table

[9] Lance & Williams Distance