ITCS / DSBA 6190 Cloud Computing for Data Analysis - Spring 2022 - Sec 091 -  Friday 5:30-8:15pm - WOOD 106

Instructor: Dr. Angelina A Tzacheva, Department of Computer Science, College of Computing and Informatics,
EMail:, OfficeHours: Tuesday 3:00 pm - 5:00 pm via   WebEx   link  :

Join Zoom Meeting

Meeting ID: 879 9594 6159
Passcode: 059885

SkypeID:    angelina.tzacheva

Teaching Assistants:
1. Shruthi Vasalamarri, EMail: ,
 Office Hours: Monday 11am - 12:30pm & Tuesday 11am - 12:30pm via Zoom links :

  OfficeHoursDay1: Monday 11am - 12:30pm


   OfficeHoursDay2:  Tuesday 11am - 12:30pm


   SkypeID: svasalam

2. Bhargava Ram Bonala , EMail: ,
 Office Hours: Wednesday 10am - 11:30am & Thursday 10am - 11:30am via Zoom links :

Wednesday 10am - 11:30am


Thursday 10am - 11:30am



3. Amulya Cheyala , EMail: ,
 Office Hours: Friday 10am - 11:30am & Saturday 10am - 11:30am via Zoom links :

Friday 10am - 11:30am


   OfficeHoursDay2:  Saturday
10am - 11:30am



: ITCS 2114 Algorithms & Data Structures. Familiarity with Java (or Python / Scala), Unix, Data Structures and Algorithms, Linear Algebra, Probability and Statistics. Good programming skills, and solid mathematical background.

Review Documents
Probability Reminders
Introduction to Proof Techniques
Linear Algebra

1. Mining of Massive Datasets, 2nd Edition, by Jure Leskovec, Anand Rajaraman, and Jeff Ullman, Cambridge University Press, 2014, ISBN: 9781107077232
2. Hadoop: The Definitive Guide, 4th Edition, by Tom White, O’Reilly Media, 2015, ISBN: 9781491901632
3. Learning Spark Lightning-Fast Big Data Analysis, by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, O’Reilly Media, 2015, ISBN: 9781449358624
4. Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press, 2008, ISBN: 9780521865715

Course Outline:
- Distributed Computing and Cloud
- Hadoop, MapReduce
- Pig, Hive, Spark
- Information Retrieval, Indexes, Scores
- Web Search, Page Rank
- Data Mining Algorithms
- Rules, Clustering, Classification
- Social Network Analysis

Student Learning Outcomes:
1. Recognize and Define Cloud Platforms and Tools
2. Deploy and Analyze Datat using Cloud Tools
3. Demonstrate Programming Skills for Cloud Platforms

Instructional Method: 
This course takes case and project approach, complemented by lectures, and group activities. Activle Learning Activites and Flipped Classroom approach will be used once per week.
Lectures Notes, Videos, and Reading Assignments are posted in the syllabus table below, as well as on Canvas. Please download and read each lecture material, and view each Video on the specified day.
All material by date is listed, including preparation for the exams with sample questions. The exams are open-book / open-notes. The textbooks are necessary, as exam questions are based on lecture notes AND on the text.

Credit Hours: This is a 3 credit hour course.
This course is designed to require about 10 hours per week - for readings, exams, exercises, video cases, and group project work.
The material is technical, and requires dedication of time to comprehend.  To complete course successfully, Please   do not plan on   cramming  all lectures the day before the exam. Designate 3 hours every lecture day for reading the given lecture, and book chapter. Designate additional 4 hours per week for Exercises, VideoCase assignments, and Group meetings / activites.
Exercises are assigned after each chapter. The Exercises are due on Canvas on the dates they are assigned. Exercises are *not accepted* through e-mail. Late Exercises are not accepted.

The final course grade is determined on the following weights:
Exercises   20%
VideoCases 10%
GroupActivites 14%
Midterm Exam   15%
Group Project   16%
Final Exam   15%
Attendance   10%

Grading scale:
A   90% - 100%
B   80% - 89%
C   70% - 79%
D   60% - 69%
F   less than 60%
X   academic dishonesty

Gradig Enquiries:
Grades to all Exercises, Exams, and Project are posted on Canvas shortly after the assignments are due. Students are expected to observe their grades on Canvas, and e-mail TA and Instructor immediately if they notice any issues . Students who have questions or concerns about their final CourseTotal grade are expected to e-mail the TA and Instructor at least 1 week prior to letter Grades being assigned on registration system . The letter grades Due date is found on the University Calenadar  at the end . Once the letter grades are assigned and rolled on  registration system , we are unable to change the grades anymore .

Academic Integrity and Honesty:
Students are required to read and abide by the Code of Student Academic Integrity availbe from Dean of Students Office. This code forbids cheating, fabrication or falsification of information, multiple submissions of academic work, plagiarism (including viewing others work without instructor permission), abuse of academic materials, and complicity of academic dishonesty. Violations of the Code of Student Academic Integrity, including plagiarism, result in disciplinary action as provided by the Code.

We are concerned with a positive learning experience. This course strives to create an inclusive academic climate in which the dignity of all individuals is respected and maintained. We value diversity that is beneficial to both employers and societey at large. Students are encouraged to actively and appropriately share their views in class discussions.

Inclement Weather:
University Policy  states the University is open unless the Chancellor announces that the University is closed. In the event of inclement weather, check your e-mail. The instructor will post a message through e-mail. The instructor will use their best judgment as to whether class should be held.

We are committed to access to education. If you have a disability and need academic accommodations, please provide a letter of accommodation from Disability Services early in the semester. For more information on accommodations, contact the Office of Disability Services or visit their office.

The University policy on Course Withdrawal allows students a limited number of opportunities available to withdraw from courses. There are financial and academic consequences that may result from course withdrawal. If a student is concerned about his / her ability to succeed in this course it is imporant to make an appointment to speak with the instructor as soon as possible.

Syllabus Revision:
The instructor may modify the class schedule and syllabus during the course of the semester. For example - additional educational vidoes will be posted every week. Same changed will appear on Canvas. Students are responsible for refreshing their syllabus once per week.

E-Mail Communication:
Students are responsible for *all* announcements made in class and on the class online resources. Students should check the online class resources throughout the semester. The Instructor and Teaching Assistants send occasional e-mails with important information. We send this information to the student's university email address.

Class Expectation:
By attending class beyond the first week, students agree to follow the framework and rules related to this course as described above.


Jan 14
Preview of course syllabus      |     Find your Group - members here   for the  Group Project
Project Assignment Description     |     Ph.D. Project Description        

Introduction to Data Mining ,  MapReduce and the New Software Stack     |     Design of Good MR Algorithms

Read    Chapter 1. from  Book 1. MiningOfMassiveDatasets   
Read    Chapter 2. from  Book 1. MiningOfMassiveDatasets


Exercise:    01.  DISTINCT operator using Map-Reduce     
//to turn in:  save solution in a text file and upload  to Canvas

video: L01_01_IntroDataMining_MapReduce_Models
video: L01_03_ClusterArchitecture_DistributedFileSystem
video: L01_04_MapReduce_WordCountExample
video: L01_05_MapReduce_RelationalJoin_MatrixMultiplication
video: L01_06_DataFlowSystems_BulkSynchronousSystems
Jan 21
Introduction Continued: Hadoop, HDFS, MapReduce, HIVE   |   Cloud Tools Overview   |  Basic HDFS Commands

Read    Chapter 1. from Book 2. HadoopTheDefinitiveGuide
Read    Chapter 2. from Book 2. HadoopTheDefinitiveGuide

video: L02_01_Hadoop_DistributedFileSystem
video: L02_02_HDFS_NameNode_DataNode
video: L02_03_HDFS_Pipelining_Rebalancer_UI
video: L02_04_HDFS_UserInterfaceCommands_BasicFeatures
video: L02_05_HDFS_FSNamespace_Replication
video: L02_06_HDFS_Protocol_Failure_Integrity
video: L02_07_HDFS_Staging_Pipelining_Interface

VideoCase 02.  HadoopHDFS

Hadoop Distributed File System - HDFS   |    Developing a MapReduce Application  

Read     Chapter 3. from Book 2. HadoopTheDefinitiveGuide
Read     Chapter 6. from Book 2. HadoopTheDefinitiveGuide
Read     Chapter 7. from Book 2. HadoopTheDefinitiveGuide

video: L03_01_MapReduceOverview
video: L03_02_MapReduceTools

GroupActivity_01 : Exercise : SetUp Single Node Hadoop Environment and Log In to DSBA Hadoop Cluster
Cloudera Installation - Windows     |     Cloudera Installation - MAC
Single Node Hadoop Environment Setup Simple Commands Task1     |
Instructions for logging in to the cluster Simple Commands Task2     |     Instructions for logging in to the AWS EMR cluster Simple Commands Task2
// to turn in:  save the text from your Command Window into a Text file  and upload  to  Canvas
// one group member submits this Exercise for the whole group

video: AWS-EMR_Cluster_Setup
Jan 28
Group 1 Moderator

MapReduce Types , Formats , and  Features     |     Alternate Slides     |     Alternate Slides 02     |     Alternate_Slides_03 |                  Alternate_Slides_04 |                 Alternate Slides 05    |
 Alternate Slides 06   |                  Alternate Slides 07   |                 Alternate Slides 08  

Read    Chapter 8. from Book 2. HadoopTheDefinitiveGuide
Read    Chapter 9. from Book 2. HadoopTheDefinitiveGuide

video: L20_01_MapReduce_Types     |     video: L20_01_01_MapReduce_Types    |  
  video: L20_01_02_MapReduce_Types   |   video: L20_01_03_MapReduce_Types_and_Features
video: L20_02_MapReduce_InputBasics     |     video: L20_02_01_MapReduce_InputBasics  |
  video: L20_02_02_MapReduce_Input_Basics   |    video: L20_02_03_MapReduce_InputBasics_Formarts
video: L20_03_MapReduce_OutputFormats     |     video: L20_03_01_MapReduce_OutputFormats   | 
 video: L20_03_02_MapReduce_OutputFormat
video: L20_04_MapReduce_Counters     |     video: L20_04_01_MapReduce_Counters_Sorting  | 
video: L20_04_02_MapReduce_Counters
video: L20_05_MapReduce_Sorting   |   video: L20_05_01_MapReduce_Sorting_Shuffling
video: L20_06_MapReduce_Joins     |     video: L20_06_01_MapReduce_Joins
video: L20_07_MapReduce_SideDataDistribution     |     video: L20_07_01_MapReduce_SideDataDistribution   |   video: L20_07_01_Side_Data_Distribution
video: L20_08_MapReduce_TypesFormatsFeatures_Questions     |    
video: L20_08_01_MapReduce_TypesFormatsFeatures_Questions 
video: L20_09_Mapreduce_Library_Classes

Exercise:     02.   Example MapReduce program In Class Activity
Exercise:     02.   Example MapReduce program without Cloudera
Exercise:     02.   Example MapReduce program using AWS

//to turn in:   save your output file ( in a text file ) and upload to Canvas

video: Exercise02_ExampleMapReduce_01
video: Exercise02_ExampleMapReduce_02
video: Exercise02_ExampleMapReduce_03
video: Exercise02_ExampleMapReduce_04
video: Exercise02_ExampleMapReduce_05_AWS
video: Exercise_02_Example_MapReduceProgram_Cloudera_DEMO_04
video: Exercise_02_Example_MapReduceProgram_usingAWS_DEMO_04
video: Exercise_02_Example_MapReduceProgram_Cloudera_DEMO_05
video: Exercise_02_Example_MapReduceProgram_usingAWS_DEMO_05
video: Exercise_02_Example_MapReduceProgram_usingAWS_DEMO_06
video: Exercise_02_Example_MapReduceProgram_Cloudera_DEMO_06
video: Exercise_02_Example_MapReduceProgram_Cloudera_DEMO_07
video: Exercise_02_Example_MapReduceProgram_usingAWS_DEMO_07
video: Exercise_02_Example_MapReduceProgram_Cloudera_DEMO_08
video: Exercise_02_Example_MapReduceProgram_usingAWS_DEMO_08

video: AWS-EMR_Cluster_Setup
video: ExampleMapReduce_WordCount_using_AWS

Cloud Tools Overview Continued:  Pig , Hive , HBase , Storm

video: L04 01 Pig    
video: L04 02 Hive   
video: L04 03 HBase   
video: L04 04 Storm

video: Example_MapReduce_WordCount

VideoCase 03. CloudTools_Pig_Hive_HBase
Feb 04

Activities :
Group 2 Moderator

Pig     |     Hive     |     HBase     |    Zookeeper     |     Alternate Slides     |     Alternate Slides 02   |
   Alternate Slides 03       |            Alternate_Slides_04            |              Alternate_Slides_05 |
 Alternate_Slides_06 |             Alternate Slides 07     |              Alternate Slides 08

Read    Chapter 16. from Book 2. HadoopTheDefinitiveGuide
Read    Chapter 17. from Book 2. HadoopTheDefinitiveGuide
Read    Chapter 20. from Book 2. HadoopTheDefinitiveGuide

video: Zookeeper
video: Hive_HBase_PIG
video: Apache_Hive

video: L21_01_Pig_DataTypes_DataFlow_LogicPlan     |  
video: L21_01_01_Pig_DataTypes_DataFlow_LogicPlan   |   L21_01_02_Pig_DataTypes_DataFlow_LogicPlan
video: L21_02_HBase_ColumnOrientation_ArchitecturalComponents     | 
video: L21_02_01_HBase_ColumnOrientation_ArchitecturalComponents   |
video: L21_02_02_HBase_Architecture_Mechanism   |
video: L21_02_03_HBase_Features_Applications
video: L21_03_Zookeper_Service_Usage   | 
video: L21_03_01_Zookepeer_Service_Architecture_DataModel
video: L21_04_Hive_HiveQLQueryLanguage_MetaStore   | 
video: L21_04_01_Hive_Features_Architecture   |    
video: L21_05_Hive_CommandsExample
video: L21_06_Exercise03_HiveProgram_Explanation    |   
video: L21_06_01_Excercis03_HiveProgram_Explanation     |
video: L21_07_Pig_Hive_HBase_Zookeeper_Questions
Video: L21_08_Pig_Architecture_Components_Elements

Exercise:    03.   Hive program
Exercise:    03.   HiveProgram_using_AWS

video: Exercise03_HiveProgram_DEMO_01
video: Exercise03_HiveProgram_DEMO_02
video: Exercise03_HiveProgram_DEMO_03
video: Exercise03_HiveProgram_DEMO_04
video: Exercise03_HiveProgram_DEMO_05
video: Exercise03_HiveProgram_AWS_DEMO
video: Exercise03_HiveProgram_AWS_DEMO_04
video: Exercise03_HiveProgram_AWS_DEMO_05
video: Exercise03_HiveProgram_AWS_DEMO_06
video: Exercise03_HiveProgram_AWS_DEMO_07

Intro to Spark ,  Programming with RDDs ,  Running on a Cluster ,  Spark SQL and MLib , Spark Streaming

Intro to Spark ( continued )

Read    Chapter 1. from Book 3. LearningSpark
Read    Chapter 2. from Book 3. LearningSpark
Read    Chapter 3. from Book 3. LearningSpark

video: L05 01 IntroToSpark LimitationsOfMapReduce
video: L05 02 SparkComutingEngine ResilientDistributedDatasetsRDDs
video: L05 03 SparkBenefitsForUser GeneralPlatform
video: L05 04 Spark MLlib GraphX Streaming SQL
video: L05 05 Spark SoftwareStack RunTimeArchitecture ProgrammingRDDs
video: L05 06 Spark Continued RunTimeArchitecture ProgrammingRDDs DataAnalysisExample

video: IntroductionToSpark

VideoCase 04. Spark
Feb 11
Activities :
Group 3 Moderator

Downloading Spark , Getting Started , Simple Spark Applications , Scala  and Python Example Programs             |     Alternate Slides          |          Alternate Slides_02     |     Alternate_Slides_03 |      Alternate_Slides_04 |
Alternate_Slides_05      |      Alternate_Slides_06     |      Alternate_Slides_07     |      Alternate_Slides_08

Intro to Scala

Read    Chapter 5. from Book 3. LearningSpark
Read    Chapter 9. from Book 3. LearningSpark
Read    Chapter 11. from Book 3. LearningSpark

video: L10_01_SparkOverviewDownloadingGettingStarted          |    
video: L10_01_02_SparkOverviewDownloadingGettingStarted          |    
video: L10_01_03_SparkOverviewDownloadingGettingStarted         |         video:L10_01_02_SparkDownloadingGettingStarted_Continued         video:L10_01_03_SparkDownloadingGettingStarted_Continued   | 
Video: L10_01_04_SparkDownloadingGettingStarted_Continued
Video: L10_01_05_SparkDownloadingGettingStarted_Continued
video: L10_02_SimpleSparkApplications          |          video: L10_02_02_SimpleSparkApplications
video: L10_03_SimpleSparkApplicationsContinued          |         
video: L10_03_02_SimpleSparkApplicationsContinued_PySpark   |   L10_03_03_SimpleSparkApplicationsContinued_PySpark
video: L10_04_IntroToScala          |          video: L10_04_02_IntroToScala
video: L10_05_IntroToScalaContinued   |   Video: L10_05_01_IntroToScalaContinued   |   Video: L10_05_02_IntroToScalaContinued   |    Video: L10_05_02_Spark_Python_Java_Scala_CodeExamples  
video: L10_06_Spark_Phyton_Java_Scala_CodeExamples
video: L10_07_IntroToSpark_Questions          |          video: L10_07_02_IntroToSpark_Questions

Exercise:    04.   SparkSQL
Exercise:    04.   SparkSQL using AWS

video: Exercise04_SparkSQLProgram_DEMO
video: Exercise04_SparkSQLProgram_DEMO_01
video: Exercise04_SparkSQLProgram_DEMO_02
video: Exercise04_SparkSQLProgram_DEMO_03
video: Exercise04_SparkSQLProgram_DEMO_04
video: Exercise04_SparkSQLProgram_AWS_DEMO

Finding Similar Items
Locality-Sensitive Hashing Sect. 3.1-3.4    |    Locality Sensitive Hashing II Sect. 3.5-3.8   

Read    Chapter 3. from Book 1. MiningOfMassiveDatasets

video: L06_01_FindingSimilarItems
video: L06_02_Shingles
video: L06_03_Minhashing
video: L06_04_MinhashingContinued
video: L06_05_LocalitySensitiveHashing LSH
video: L06_06_LSH CustomerRecordsWebDocuments

VideoCase 05. FindingSimilarItems

Feb 18
VideoCase 06. PageRank

Activities :
Group 4 Moderator

Boolean Retrieval       |      Term Vocabulary and Posting Lists       |      Web Search Basics       |       Alternate Slides          |          Alternate_Slides_02     |     Alternate_Slides_03    |     Alternate_Slides_04 |      Alternate_Slides_05
|      Alternate_Slides_06      |     Alternate_Slides_07     |     Alternate_Slides_08

Read     Chapter 1. from Book 4. InformationRetrieval
Read     Chapter 2. from Book 4. InformationRetrieval
Read     Chapter 19. from Book 4. Information Retrieval

video: L11_01_InformationRetreival_InvertedIndex_01     |  
video: L11_01_01_InformationRetreival_InvertedIndex   |   video: L11_01_02_InformationRetreival_InvertedIndex_01   video: L11_01_03_InformationRetreival_InvertedIndex_01   |  
video: L11_02_BooleanQueries_RankedRetreival     |   
video: L11_02_02_BooleanQueries_RankedRetreival     |  
video: L11_02_03_BooleanQueries_RankedRetreival   |  
video: L11_02_04_BooleanQueries_RankedRetrieval     | 
video: L11_03_DocumentDelineationCharacterSequenceDecoding_03   |   video:L11_03_02_DocumentDelineationCharacterSequenceDecoding
video: L11_04_VocabularyOfText_Tokenization_04     |    
video: L11_04_01_VocabularyOfText_Tokenization  | 
video: L11_04_02_VocabularyOfText_Tokenization    |  
video: L11_04_03_VocabularyOfText_Tokenization     |  
video: L11_04_04_VocabularyOfText_Tokenization    |    
video: L11_04_05_VocabularyOfText_Tokenization
video: L11_05_PostingListIntersectionSkipPointer_PhraseQueries_05  | video:L11_05_02_PostingListIntersectionSkipPointer_PhraseQueries
video: L11_06_WebSearchBasics_06     |     video: L11_06_01_WebSearchBasics     |                                          
video: L11_06_02_WebSearchBasics_PageRank   |   video: L11_06_03_WebSearchBasics_PageRank   |    
video: L11_06_04_WebSearchBasics    |     video: L11_06_05_WebSearchBasics   | 
video: L11_06_06_WebSearchBasics_PageRank
video: L11_07_WebAdvertising   |   video: L11_07_WebearchBasics
video: L11_08_BooleanRetreivalWebSearchBasics_Questions     |    
video: L11_08_01_BooleanRetreivalWebSearchBasics_Questions  | video:L11_08_02_BooleanRetreivalWebSearchBasics_Questions

Link Analysis I - PageRank - Sect. 5.1-5.2      |      Link Analysis II - Link Spam , HITS - Sect. 5.3 - 5.5

Read    Chapter 5. from Book 1. MiningOfMassiveDatasets

Video: L08_01_PageRank_RandomWalkers_TransitionMatrix
Video: L08_02_PageRank_DeadEnds_SpiderTraps
Video: L08_03_PageRank_TopicSpecific_TeleportSets
Video: L08_04_PageRank_HITS_HubsAuthorities
Video: L08_05_WebSpam_SEO_TermSpamming_LinkSpamming
Video: L08_06_LinkSpam_SpamFarm_TrustRank_SpamMass
Video: L08_07_MultiplicationOfHugeVectorAndMatrix_BlocksOfStochasticMatrix

Video: PageRank

Excercise:   05   PageRank

Video: PageRank Using MapReduce
Video: Exercise05_PageRankProgram_DEMO_01
Video: Exercise05_PageRankProgram_DEMO_02
Video: Exercise05_PageRankProgram_DEMO_03
Video: Exercise05_PageRankProgram_DEMO_04
Video: Exercise05_PageRankProgram_DEMO_05
Video: Exercise05_PageRankProgram_AWS_DEMO
Video: Exercise05_PageRankProgram_AWS_DEMO_04
Video: Exercise05_PageRankProgram_AWS_DEMO_05
Video: Exercise05_PageRankProgram_DEMO_05_InClass
Video:Exercise05_PageRankProgram_AWS_DEMO_06 Video:Exercise05_PageRankProgram_AWS_DEMO_06_Presentation
Feb 25
No Class Today -
Watch Lecture Video , Read PowerPoints , Read Book Chapter , and do the Exercise / VideoCase below

VideoCase 07. AssociationRulesMarketBasketAnalysis

Data - Types, Quality, Pre-processing, Similarity Measures
Alternate_Slides_01      |      Alternate_Slides_02           |               Alternate_Slides_03

Mathematical Background Review - Intro To Set Theory

Association Rule Mining - Agrawal (Apriori) method (frequent item-sets)

video: L03_01IntroToSetTheorySetsElementsEmtpySetUniversalSet
video: L03_02IntroToSetTheoryIntersectionUnionComplementSetDifference
video: L03_03AssociationRulesIntroAprioriAgrawalMethod

Activities :
Group 9 Moderator

video:  L25_01_Data_Mining_Density_Based_Clustering  
video:  L25_02_Data_Mining_EuclideanDistance_CosineSimilarity   
video:  L25_03_Data_PreProcessing_FeatureCreation_Selection_FSS_Techniques
video:  L25_04_Data_Mining_DataQuality_Noise_Outliers   
video:  L25_05_Data_Mining_Sampling_CurseOfDimensionality 
video:  L25_06_Data_Mining_Similarity_Dissimialrity_DistanceMeasures  
video:  L25_07_Data_Mining_Data_TypesOfAttributes
video:  L25_08_Data_Mining_Types_and_Characterstics_of_Data

GroupActivity_02 :  Exercise :    Download RSES Software | Calculate Rules and Classify Data
// to turn in: save Project file ( .rses file ) and upload the  .rses file   to  Canvas
// one group member submits this Exercise for the whole group

Exercise19_Chapter02_Part1_Similarity_Measures  (Extra Credit)
Exercise19_Chapter02_Part2_Similarity_Measures_JAVA (Extra Credit)
Exercise19_Chapter02_Part3_Similarity_Measures_SPARK (Extra Credit)


Video : Similarity_Measures_Exercise19_Chapter02_JAVA_DEMO_01
Video : Similarity_Measures_Exercise19_Chapter02_JAVA_DEMO_02
Video : Similarity_Measures_Exercise19_Chapter02_JAVA_DEMO_03
Video: Similarity_Measures_Exercise19_Chapter02_JAVA_DEMO_04
Video :Similarity_Measures_Exercise19_Chapter02_AWS_DEMO_01
Video :Similarity_Measures_Exercise19_Chapter02_AWS_DEMO_02
Video :Similarity_Measures_Exercise19_Chapter02_SPARK_DEMO_01
Video :Similarity_Measures_Exercise19_Chapter02_SPARK_DEMO_02
Video: Similarity_Measures_Exercise19_Chapter02_SPARK_DEMO_03

Preparing for Midterm Exam          |         Sample Questions           |           AnswerKey
Mar 04
Midterm Exam
- access exam on Canvas
- exam starts at the time of the class
- allowed time for exam is:       3:00 hours
Mar 11
Spring Break - No Class
Mar 18
Activities :
Group 5 Moderator

Frequent Itemsets , Market Basket , Association Rules , Apriori , Other Algorithms        
Alternate Slides    |     Alternate_Slides_02    |  Alternate_Slides_03   |   Alternate_Slides_04    |  
Alternate_Slides_05     |    Alternate_Slides_06     |       Alternate_Slides_07

Argawal (Apriori) method (frequent item-sets) Example

Read    Chapter 6. from Book 1. MiningOfMassiveDatasets

video: L04_01SupportAndConfidence_AssociationRules
video: L04_02AprioriEample_FrequentItemsets
video: L04_03AprioriExample_AssociationRules

video: L22_01_MarketBasketAnalysis_Support_FrequentItemSets    |    
video: L22_01_01MarketBasketAnalysis_Support_FrequentItemSets   |   L22_01_02_MarketBasketAnalysis_Support_FrequentItemsets
video: L22_02_AssociationRules_Intro    |    video: L22_02_01AssociationRules_Apriori   |  
video: L22_02_02_AprioriExamples_FrequentItemsets
video: L22_03_AssociationRules_Apriori_Algorithm   |    
video:  L22_03_01_AprioriExample_AssociationRules    | 
video:  L22_03_02_AprioriExample_AssociationRules
video: L22_04_AssociationRules_PCY_ParkChen_Algorithm   |  
video:  L22_04_01_AssociationRules_PCY_ParkChen_Algorithm
video: L22_05_AssociationRules_Simple_SON_Toivonen_Algorithms  | 
video: L22_05_01_AssociationRules_Simple_SON_Toivonen_Algorithms    |   
video: L22_05_02_AssociationRules_Simple_SON_Toivonen_Algorithms
video: L22_06_FrequentItemsets_MarketBasket_AssociationRules_Questions  |  
video: L22_07_Frequent_Pattern_Growth_Strategy 

Exercise:     06    Association Rule Mining  ( ex.  2. (a) (b)   Chapter 6 DataMiningBook )
Exercise:    07_Part1   AprioriAssociationRules   (  ex.  6.   Chapter 6 DataMiningBook )     |    
Exercise:    07   Part2   AssociationRules_Spark     |
Exercise:    07   Part2   AssociationRules_Spark_AWS

Association Rules Example Code_01     |     video: Exercise07_AssociationRulesProgram_DEMO_01
Association Rules Example Code_02     |     video: Exercise07_AssociationRulesProgram_DEMO_02
Association Rules Example Code_03     |     video: Exercise07_AssociationRulesProgram_DEMO_03

video: Exercise07_SparkAssociationRules_AWS_DEMO
video: Exercise07_SparkAssociatonRules_AWS_DEMO_04
video: Exercise07_SparkAssociatonRules_AWS_DEMO_05
video: Exercise07_SparkAssociatonRules_AWS_DEMO_06
video: Exercise07_SparkAssociatonRules_AWS_DEMO_07
Association Rules  Extraction using Spark - in UNCC DSBA-HADOOP cluster
Association Rules  Extraction using Spark - in AWS EMR cluster

Frequent Pattern Growth Strategy (FP-Tree)

video: L05_01FrequentPatternTree_FPTree01
video: L05_02FPTree02
video: L05_03FPTree03
video: L05_04MiningTheFPTree01
video: L05_05MiningTheFPTree02

GroupActivity_03_Part1 : Exercise :   build the FP-Tree  using the transactions from table 6.24 ( in exercise 8  chapter 6 DataMiningBook )
GroupActivity_03_Part2 : FP-Growth Program in Spark - using DSBA-Hadoop Cluster   or
GroupActivity_03_Part2 : FP-Growth Program in Spark - using AWS-EMR Cluster   or
// to turn in: save the answer in a Text file  and upload the file   to  Canvas
// one group member submits this Exercise for the whole group

Mar 25
No Class Today -
Watch Lecture Video , Read PowerPoints , Read Book Chapter , and do the Exercise / VideoCase below

Decision rules - LERS (certain and possible rules)

video: L06_01LERSIntroduction
video: L06_02LERSExampleFirstLoop
video: L06 03LERSExampleCertainPossibleRules
video: L06 04LERSExampleSecondLoop
video: L06 05LERSExampleThirdLoopEnd

Activities :
Group 10 Moderator

Part1: Exercise : 08_Java_LERS     |     Part2: Exercise : 08_Hadoop_LERS     |      Part3: Exercise : 08_Spark_LERS
Part2: Exercise : 08_Hadoop_LERS_using_AWS     |      Part3: Exercise : 08_Spark_LERS_using_AWS

- calculate rules using data from the lecture above
// to turn in:   take a screen shot of your runtime environment showing the rules | upload the screen shot to Canvas

Spark_LERS_01_Alternate Slides         |      Spark_LERS_02_Alternate Slides02

LERS Example Program

video: Exercise08_Hadoop_LERS_DEMO     |     Exercise08_HadoopLERS_AWS_DEMO

video: Exercise08_SparkLERS_DEMO
video: Exercise08_SparkLERS_AWS_DEMO
video: Exercise08_SparkLERS_AWS_DEMO_02
video: Exercise08_SparkLERS_AWS_DEMO_03
video: Exercise08_Hadoop LERS AWS_DEMO_04
 video: Exercise08_JAVA_LERS_AWS_DEMO_04
Video: Exercise08_JAVA_LERS_AWS_DEMO_05
 video: Exercise08_SPARK_LERS_AWS_DEMO_04
Video: Exercise08_SPARK_LERS_AWS_DEMO_05

VideoCase 08. MiningActionablePatterns

Action Rules Intro           |           Action Rules 1         |          Action Rules 2

Action Rules 3

Action Rule Discovery Example

video: L07  01ActionRulesIntroduction
Video: L07_01_01_ActionRulesIntroduction
video: L07_02ActionRulesIntroSupportConfidence
video: L07_03ActionRulesExample
Video: L07_03_01_ActionRulesExample
Video: L07_04_Action_Rules_Discovery_LERS
Video: L07_05_Interestingness_Measures
Video: L07_06_LERS
Video: L07_07_LERSExample
Video: L07_08_Knowledge_Discovery_In_Databases_and_Applications

GroupActivity_04 : Exercise : - download Action Rules software , and MapReduce Action Rules software , and run on Hadoop:

GroupActivity_04_Part2_ActionRules_in_MapReduce_MRRandomForest_ActionRules_01   (or)   GroupActivity_04_Part2_ActionRules_in_MapReduce_MRRandomForest_ActionRules_01_AWS

GroupActivity_04_Part3_ActionRules_in_MapReduce_MRApriori_ActionRules_01   (or) 

GroupActivity_04_Part4_ActionRules_in_Spark   (or)   

Example Programs:
ActionRulesJavaExample        |       SparkActionRulesExample       |      MR-RandomForest_ActionRulesExample_01   |  MR-RandomForest_AND_Apriori_ActionRules_Example_02       |        MR-Apriori_ActionRules_Example_01      |  MR-Apriori_ActionRules_Example_02       |            MR-Actions_OntologyExample      
// use the CarEvaluation Dataset , and Mammographic Mass dataset - replicate 1024 times . create a table to show the number of mappers you had , and the time it took to run for each dataset . upload your table , and your output files to Canvas  
// one group member submits this Exercise for the whole group

video: GroupActivity04_Part2_MRRandomForestActionRules_01_DEMO     |     video:GroupActivity04_Part2_MRRandomForestActionRules_01_AWS_DEMO
video : MR-RandomForest_AND_MR-Apriori_ActionRules_Example_02
video: GroupActivity04_Part3_MRAprioriActionRules_01_DEMO  |  video:GroupActivity04_Part3_MRAprioriActionRules_01_AWS_DEMO
video: GroupActivity04_part4_SparkAction_DEMO     |     video: GroupActivity04_part4_SparkAction_AWS_DEMO |
video: GroupActivity04_part4_SparkAction_AWS_DEMO_02
video: GroupActivity04_part4_SparkAction_AWS_DEMO_03

UCI Machine Learning Repository

Apr 01

Activities :
Group 6 Moderator

Recommender Systems 01 , Content Based , Collaborative Filtering     |     Alternate_Slides_01     |     Alternate_Slides_02 |    |     Alternate_Slides_03 |      |     Alternate_Slides_04 | |    Alternate_Slides_05 | Alternate_Slides_06    |      Alternate_Slides_07    |    Alternate_Slides_08

Read    Chapter 9. from Book 1. MiningOfMassiveDatasets

video: L12_01_RecommenderSystems_ContentBased_CollaborativeFiltering     |     video:L12_01_01_RecommenderSystems_Intro    |    
video:L12_01_02_RecommenderSystems_Intro |

video: L12_01_03_RecommenderSystems_ContentBased_CollaborativeFiltering    |
video: L12_01_04_RecommenderSystems_ContentBased_CollaborativeFiltering    |
video: L12_02_ItemProfiles_DocumentCollections   
video: L12_03_ItemProfiles_UserProfiles_RecommendingItemsToUsers     |     video:L12_03_01_ItemProfiles_UserProfiles_RecommendingItemsToUsers |
video:L12_03_02_ItemProfiles_UserProfiles_RecommendingItemsToUsers   | 
video: L12_03_03_ItemProfiles_UserProfiles_RecommendingItemsToUsers
video: L12_04_01_DecisionTreeExampleCode_DEMO_02
video: L12_05_RecommenderSystems_CollaborativeFiltering
video: L12_06_RecommenderSystems_ContentBasedFiltering  | 
video: L12_06_01_RecommenderSystems_ContentBased
video: L12_07_ClusteringUsersAndItems
video: L12_08_RecommenderSystems_Questions
video: L12_11_Applications_of_Recommendation_Systems   |  
video: L12_12_Classification_Algorithm_Measuring_Similarity

Decision Tree Example Code - Windows Version       |     Decision Tree Example Code - Linux Version     |     DecisionTree_ExampleCode_02

video: Exercise_02_DecisionTree_Program_Code_DecisionTree_Java_DEMO
video: L12_04_DecisionTreeExampleCode_DEMO     |    
video: L12_04_DecisionTree_ExampleCode_DEMO_02     |    
video: L12_04_DecisionTree_ExampleCode_DEMO_03    |  
video: L12_04_DecisionTree_ExampleCode_DEMO_04  

Decision Trees - Discovery     |     Decision Tree, RandomForest

video: L09_01DecisionTreesIntroduction
video: L09_02DecisionTreesIntroExamples
video: L09_03DecisionTreesEntropyInformationGain
video: L09_04DecisionTree_RandomForest

VideoCase 09. DecisionTrees

System ID3 Example          |         Mathematical Background Review - Logarithm

Exercise:    09   DecisionTree - 
Exrecise 09.1:  ( ex.  2.   Chapter 4 DataMiningBook )   and 
Exercise 09.2. :   ( ex. 3.    Chapter 4  DataMiningBook )

video: L10_01System_ID3_Example_Entropy
video: L10_02System_ID3_Example_Entropy02
video: L10_03System_ID3_Example_AtributeSelection

GroupActivity_08.   Spark MLlib DecisionTree  program
GroupActivity_08.   Spark MLlib DecisionTree  program_AWS

video: GroupActivity08_SparkDecisionTree_DEMO
video: GroupActivity08_SparkDecisionTree_DEMO_02
video: GroupActivity08_SparkDecisionTree_AWS_DEMO
video: GroupActivity08_SparkDecisionTree_AWS_DEMO_04
video: GroupActivity08_SparkDecisionTree_AWS_DEMO_05
video: GroupActivity08_SparkDecisionTree_AWS_DEMO_06
video: GroupActivity08_DecisionTree_DEMO_06
video: GroupActivity08_DecisionTree_DEMO_07

Apr 08
Clustering , Hierarchical , Agglomerative , Point-Assignment , Measures of Goodness , BFR , CURE

Read    Chapter 7. from Book 1. MiningOfMassiveDatasets

Cluster Analysis - Basic Concepts and Algorithms       |    Cluster Analysis - Basic Concepts and Algorithms_01
Partitioning Clustering - K-Means Example    |    
Alternate_Slides_01     |     Alternate_Slides_02     |     Alternate_Slides_03

video: L14_01ClusterAnalysisAlgorithm    |    
video: L14_02ClusterAnalysisIntroPlottingOfObjects    |    
video: L14_03ClusterAnalysisPreProcessingCharacteristicsOfData
video: L14_04ClusterAnalysisTypesOfClusters
video: L14_05PartitioningClusteringKMeans
video: L14_05_01PartitioningClusteringKMeans
video: L14_05_02_PartitioningClusteringKMeans_Advantages_Disadvantages
video: L14_06PartitioningClusteringKMeansContinued
video: L14_06_01PartitioningClusteringKMeansContinued
video: L14_07_PartitioningClustering_KMeans_Non Euclidean
video: L14_08_PartitioningClusteringKMeans_Strategies
video: L14_09_ClusteringOfStreams_Parallelism
video: L14_10_Hierarchial_Clustering

video: L15_01KMeansExampleProblemPart1
video: L15_02KMeansExampleProblemPart2
video: L15_03KMeansExampleProblemPart3
video: L15_04_KMeans_Example   

Activities :
Group 11 Moderator

GroupActivity_05 : Exercise:   K-Means Clustering on Hadoop (extra credit)   and  
GroupActivity_05 : Exercise:   K-Means Clustering on Spark MLib     |    
GroupActivity_05 : Exercise:   K-Means Clustering on Spark MLib_AWS

video:GroupActivity05_Spark_kMeansClustering_AWS_DEMO_01 |
video:GroupActivity05_KMeansClustering_AWS_DEMO_02 |
video: GroupActivity05_KMeansClustering_AWS_DEMO_06

Apr 15

Analysis of Social Networks
Analysis of Massive Graphs I       |     Analysis of Massive Graphs II

Read     Chapter 10. from book 1. MiningOfMassiveDatasets
Video: Detecting Communities as Clusters

video: L18_01ExampleGraph_CommunityDetectionInGraphs_AdjacencyMatrix
video: L18_02Community_AffiliationGraphModel_AGM
video: L18_03MaximumLiklihoodEstimation_MLE_forAGM
video: L18_04AGM_BigClaim_CommunityMembershipStrengthsExample
video: L18_05AGM_BigClaim_FindingMembershipStrengthMatrixF
video: L18_06Thrawling_FrequentItemSets_SmallBipartiteGraphs

Activities :
Group 7 Moderator

Computational Advertising       |      Comparison between MapReduce and bulk-synchronous systems             |           Alternate Slides     |           Alternate_Slides_02     |           Alternate_Slides_03     |          Alternate_Slides_04    |
Alternate_Slides_05  |       Alternate_Slides_06      |        Alternate_Slides_07

Read     Chapter 8. from book 1. MiningOfMassiveDatasets

video: L19_01_ComputationalAdviertisement_GreedyAlgorithm_CompetitiveRatio    |     video:L19_01_01_ComputationalAdviertisement_Introduction     |     video:L19_01_02_ComputationalAdviertisement_Introduction   | 
video: L19_02_ComputationalAdvertising_MatchingProblem     |     video:L19_02_01_ComputationalAdvertising_CompetitiveRatio   |  video:L19_02_01_01_ComputationalAdvertising_CompetitiveRatio     |   
video:L19_02_01_02_ComputationalAdvertising_CompetitiveRatio   |
 video: L19_02_ComputationalAdvertisement_GreedyAlgorithm_CompetitiveRatio    | 
video:L19_02_02_SearchAdvertisement_AddWords  |
video:L19_02_02_01_SearchAdvertisement_AddWords  | 
video: L19_02_02_02_SearchAdvertisement_AddWords   |  
video: L19_02_02_03_SearchAdvertisement_AddWords    |  
video: L19_03_SearchAdvertisement_GreedyAlgorithm_BalanceAlgorithm     |     video:L19_03_01_SearchAdvertisement_GreedyAlgorithm     |     video:L19_03_02_SearchAdvertisement_BalanceAlgorithm     | 
video: L19_03_02_SearchAdvertisement_GreedyAlgorithm   |  
video: L19_03_03_SearchAdvertisement_BalanceAlgorithm    |  
video: L19_04_MatchingBids_SearchQueries  |   
video: L19_04_01_MatchingBids_SearchQueries  |
video:L19_04_02 MatchingBids_SearchQueries   |
video: L19_05_MapReduce_BulkSynchronous_SolutionsToGraphModel     |     video:L19_05_01_MapReduce_SolutionsToGraphModel   |     video:L19_05_01_01_MapReduce_SolutionsToGraphModel     |   
video:L19_05_01_02_MapReduce_SolutionsToGraphModel    |
video:L19_05_02_MapReduce_BulkSynchronous_SolutionsToGraphModel    |
video:L19_05_02_01_MapReduce_BulkSynchronous_SolutionsToGraphModel   |
video: L19_06_ComputationalAdviertisement_Questions     |     video:L19_06_01_ComputationalAdviertisement_Questions

video: Google_AdWords

GroupActivity 07 : Exercise : Graph Analysis in Spark GraphX      |   
GroupActivity 07 : Exercise : Graph Analysis in Spark GraphX_AWS

video: L19_07_Example_GraphAnalysisInSparkGraphX_Code_DEMO     |     video:L19_07_01_Example_GraphAnalysisInSparkGraphX_Code_DEMO_02
Example_GraphAnalysisSparkGraphX_Code_01       |     Example_GraphAnalysisSparkGraphX_Code_02

video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO
video: GroupActivity07_SparkAnalysis_SparkGraphX_AWS_DEMO
video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO_02
video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO_03
video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO_04
video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO_05
video: GroupActivity07_SparkAnalysis_SparkGraphX_DEMO_06

Apr 22
Clustering Techniques (Continued)       |      Hierarchical Clustering - Single Link Example

video: L16_01HierarchicalCustering
video: L16_02HierarchicalClusteringAgglomerativeProximityMatrix
video: L16_03HierarchicalCusteringInterClusterDistances

video: L17_01HierarchicalClusteringSingleLinkExamplePart1
video: L17_02HierarchicalClusteringSingleLinkExamplePart2
video: L17_03HierarchicalClusteringSingleLinkExamplePart3
video: L17_04HierarchicalClusteringSingleLinkExamplePart4

Exercise : _10  HierarchicalClustering_( Ex. 16.  Chapter 8  DataMiningBook )

GroupActivity_06 : Exercise:    download WEKA software, and ORANGE software - run clustering, association rules discovery, and a decision tree  ( use one of the datasets - of your choice  - which are pre-loaded in RSES )
// take a screen shot of your project , and upload to Canvas
// one group member submits this Exercise for the whole group

video: Exercise_02_DecisionTree_OrangeDataMiningSoftware_DEMO

Apr 29
No Class Today -
Watch Lecture Video , Read PowerPoints , Read Book Chapter , and do the Exercise / VideoCase below

Activities :
Group 12 Moderator

VideoCase 10. TextClassification

Naive Bayes Classification - Intro     |     Alternate Slides
Text Classification, Naive Bayes      |     Alternate_Slides

Read     Chapter 13. from book 4. Information Retrieval

video: L23_01_NaiveBayesClassifierIntro_ProbabilityBasics_01

video: L23_01_NaiveBayesClassifierIntro_03
video: L23_03_NaiveBayes_Training_TestPhase_01
video: L23_05_NaiveBayes_ZeroFrequencyProblem
video: L23_06_NaiveBayes_ClassificationTypes

video: Implementation of Naive Bayes Classification in Spark AWS_DEMO
video: Implementation of Naive Bayes Classification in Spark AWS_DEMO_01
video: Implementation of Naive Bayes Classification in Spark AWS_DEMO_02
video: Implementation of Naive Bayes Classificaion in Spark AWS_DEMO_03_01
video: Implementation of Naive Bayes Classificaion in Spark AWS_DEMO_03_02
video :Implementation_of_Naive_Bayes_Classification_in_Spark_AWS_DEMO_05

Activities :
Group 8 Moderator - PhD Students
Vector space classification   |
 Alternate_Slides           |             Alternative Slides 01              |                  Alternative Slides 02 
 Alternate_Slides_03    |            Alternative_Slides 05              |                  Alternative_Slide_06

PhD Project Presentation

Read     Chapter 14. from book 4. Information Retrieval

video: L24_VectorSpace_Classification_Introduction_Classification_Types_02  | 
video: L24_01_VectorSpaceClassification_Introduction   |   
video: L24_01_VectorSpaceClassification_Introduction_01   |  
video: L24_01_VectorSpaceClassification_Introduction_02   |  
video: L24_01_VectorSpaceClassification_Introduction_03   |
video: L24_01_VectorSpaceClassification_Introduction_04
video: L24_01_VectorSpaceClassification_Introduction_Classification_Types_01
video: L24_01_VectorSpaceClassification_Application_01  |  
video: L24_01_VectorSpace_Classification_Application_03
video: L24_02_VectorSpaceClassification_Rocchio      
video: L24_02_VectorSpaceClassification_Rocchio_01
video: L24_02_VectorSpaceClassification_Rocchio_02
video: L24_03_VectorSpaceClassification_kNN  
video: L24_03_VectorSpaceClassification_kNN_01
video: L24_03_VectorSpaceClassification_kNN_02   |  
video: L24_03_VectorSpace_Classification_kNN_03
video: L24_04_VectorSpace_Classification_LinearClassifiers     
video: L24_04_VectorSpace_Classification_LinearClassifiers_01   | 
video: L24_04_VectorSpace_Classification_LinearClassifiers_02   |   video: L24_04_VectorSpace_Classification_LinearClassifiers_03   | 
video: L24_04_VectorSpaceClassification_LinearvsNonLinearClassifiers_02      |  
video: L24_04_VectorSpaceClassification_LinearvsNonLinearClassifiers_03
video: L24_04_VectorSpaceClassification_LinearvsNonLinearClassifiers_04  | 
video: L24_05_VectorSpace_Classification_MultipleClasses
video: L24_05_VectorSpace_Classification_MultipleClasses_01
video: L24_05_VectorSpace_Classification_MultipleClasses_02  |  
video: L24_05_VectorSpace_Classification_MultipleClasses_03
video: L24_06_VectorSpaceClassification_Questions
video: L24_07_VectorSpace_Classification_NonLinearSVM

Exercise11_Support_Vector_Machine_Spark_AWS (Extra Credit Exercise)


Preparing for Final Exam          |         Sample Questions           |           AnswerKey
May 06 Final Exam
- access exam on Canvas
- exam starts at 5:30 PM and untill 8:30PM
- allowed time for exam is: 3:00 hours

Syllabus Copyright 2015-2025 Angelina A Tzacheva.
No reusage or reproduction without permission.