Cloud Computing for Data Analysis Group Project Description: —————————————— This is a Group Project. Locate your Group Members on Canvas.* Part1 ::: Organisation of Group Work --- 1.1. One student assumes the ROLE of Project LEADer any student that feels comfortable LEADing - can assume this ROLE , and do the following ::: 1.2. Set up a Doodle Poll https://doodle.com/create for Group Members to complete their available times / best times for Meetings ( Group Meetings ) SET UP the Doodle Poll as follows ::: Title: “ Group Meeting “ - Continue DAYS : Every Day - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday TIMES : create a 2 hour block ( example from 9:00am-11:00am , from 10:00am-12:00pm ) starting from 9:00am - ending until 10:00pm ( on each Day ) Complete it : for 7 Days ( 1 full week ) . use the upcoming week , starting from Monday . See EXAMPLE Doodle Poll here : https://doodle.com/poll/e7cp3yehpradfuqd?utm_source=poll&utm_medium=link 1.3. Decide how to divide the subject material among the group members . Divide the Material - so each student has 5 PowerPoints to create . 1.4. EMail ALL Group Members ( and Copy Cc: / Include the Teaching Assistants ) “ Dear Group # , My name is : … , I assume the Project LEADer Role for our course # . Please fill out your availability for Group Meetings at Doodle Poll --->>> give link to Doodle Poll here . <<<--- Meetings are held Online via Zoom - WEEKLY ( every week ) - for 2 hours . Group member are required to RESPOND and Attend / JOIN the Meetings . For your Individual Contribution , you are assigned to Create 5 PowerPoints on the following Subject : GroupMemberName | Subject GroupMemberName | Subject GroupMemberName | Subject ... list All Members Thank you , “ 1.5. Project LEADer decides what is the best time for Group Meeting , by choosing the Time from the Doodle Poll , during which MAJORITY Group Members are available . All Group Members are REQUIRED to JOIN and Attend a WEEKLY Meeting - for 2 hours . Please be flexible and FILL OUT - as many time slots as possible , in the Doodle Poll - to help find Best Time for Everyone to Meet . 1.6. Project LEADer e-mails a Meeting Reminder - every WEEK - 3 days before the Meeting time . For example , if the Meeting is on Wednesday at 6:00pm-8:00pm , then the Project LEADer e-mails ALL Group Members on Monday by 6:00pm the following : “ Dear Group # , REMINDER - we have a Group Meeting on --->>> give Day and Time here <<<--- example Wednesday at 6:00pm-8:00pm , via Zoom Link : password : Thank you , “ Part2 ::: Individual Contribution Submission --- 2.0. Each student creates 5 PowerPoint slides on your assigned subject (as shown below). Project LEADer Decides how to divide the subject material among the group members . Each student - Submit on Canvas individually - your 5 PowerPoint Slides , and a VIDEO . 2.1. Record a .mp4 video explaining your 5 PowerPoint slides. Length of the video is 5 minutes (maximum 10 minutes). 2.2. Record YourSelf - talking about your PowerPoint Slides. Including ASKING 1 QUESTION at the end of your Slides. After that - PROVIDE THE ANSWER to your question, and record that. Video should have Animations, that is: include moving objects in the video, or: draw lines, circles, use mouse pointer. Video should contain sound, record your voice reading the PowerPoints text and explaining the concepts. Example good video with Animations: _2.1. https://www.youtube.com/watch?v=ZMBTLuVJtLM - What is the world wide web Twila Camp _2.2. https://www.youtube.com/watch?v=sb7ywQDxgFs - Chapter2 1 2 4 01PhysicalLayer 02 Example poor video (not acceptable): _2.3. https://www.youtube.com/watch?v=A-uDY29YPkU - Ch5.3_5.5_06PacketScheduling_02 2.3. Proper NAMING of submission files _*. NAME your PowerPoint file as ::: Group#_SubjectOfPowerPoint.ppt for example ::: Group03_TelephoneSystem_GSM.ppt _*. NAME your VIDEO file as ::: Group#_SubjectOfVideo.mp4 for example ::: Group03_TelephoneSystem_GSM.mp4 2.4. *Note: Student who works on Writing Code , and records the Code DEMO Video is exempt from Creating PowerPoints --- for FACE-to-FACE Class ( In-Person Class) ::: --- 2.5. Present your PowerPoints, Video, and Implementation Demo to the class on your assigned Group Moderator Date. ( shown on the syllabus ). Presentation should not take more than 15 minutes altogether . 2.6. Each student presents his / her 5 PowerPoint slides, and speaks for 1 to 2 minutes maximum , and prepare 1 question for the audience based on his / her PowerPoint slides. 2.7. Answer questions. Each audience group in the class asks the Presenters 1 question. 2.8 Bring CANDY / Sweets ( ex. chocolates ( sneakers , mars , M&M's , etc.) , cookies , cupcakes , doughnuts ) for the audience. Each group will give you a score from 0 to 10 for your Presentation. Part3 ::: Group Submission Instructions --- 3.1 Total number of PowerPoints for the entire group = number of the group members * 5 For example, if there are 9 members in the group then total number of slides in the PowerPoint = 9 * 5 = 45 Project LEADer Submits one file on Canvas for the entire group. 3.2 Implement your assigned algorithm (as shown below). Use - JAVA ( or Scala ) - as a programming language . Create a User Interface . 3.3 One student runs a Demonstration of the code before the class , and explains what is the purpose of the code , what inputs it takes , what outputs it produces . 3.4. Code DEMO Video Recording Instructions ::: ( This video should explain all the steps followed in executing the program and obtaining the results.) Record a .mp4 Video Demonstrating the Code : _*0. 1 One student Record the DEMO of the Code , and Explain HOW to RUN the Code - _*1. What is the purpose of this code _*2. give the Command for Running the Code , and _*3. Specify any Parameters required for Running the Code , and _*4. give the location of the input Data file ( if any ) . *What is the Input Data File - show the Data - and explain what it means. _*5. Explain the LOGIC of the Code . *What is the Output Data file produced - show the Output - and explain what it means _*6. Show the Output files , and the Location of Output Files . _*7. OPEN the Output files , and Explain the MEANING of the Output . _*8. Create a REDME.txt file - documenting the steps _*.1 - _*.7 above and submit the README.txt file along with your Code on Canvas . _*9. Show how to remove the AWS cluster - after usage _*10. Make sure the DEMO Video has SOUND - your VOICE is recorded *Name the video file with the SUBJECT that it covers, for example: Group01_02_Exercise_ExampleMapReduceProgram_DEMO.mp4 _1.3. Video#3: create a DEMO Video of your Code / Programming Assignment Exercise - to explain : *What is the purpose of this code *What is the Input Data File - show the Data - and explain what it means *How do we run/execute this code *What is the Output Data file produced - show the Output - and explain what it means *Show how to remove the AWS cluster - after usage *Make sure the DEMO Video has SOUND - your VOICE is recorded *Name the video file with the SUBJECT that it covers, for example: Group01_02_Exercise_ExampleMapReduceProgram_DEMO.mp4 This is a Group Project. Locate your Group Members on Canvas.* 3.5. Submit the project files: PowerPoint file, VideoLink, and SourceCode to Canvas - due: 3 days prior to your assigned Group Moderator Date presentation date ( shown on the syllabus ). 3.6. Present PowerPoints, Video, and Implementation Demo to the class on your assigned Group Moderator Date. ( shown on the syllabus ). 3.7. Each student presents 5 PowerPoint slide, and speaks for 3 to 5 minutes maximum , and prepares 1 question for the audience based on his / her PowerPoint slides. 3.8. Answer questions. Each audience group asks the Presenters 1 question. 3.9. Bring CANDY / Sweets ( ex. chocolates ( sneakers , mars , M&M's , etc.) , cookies , cupcakes , doughnuts ) for the audience. Each audience group gives a score to the presenters from 0 to 10. Part4 ::: RATE my Group Members --- 4.1. Go to GoogleForm link ::: https://docs.google.com/forms/d/1Uvixq7AL1g-m2zGAu5-jma0GonkWjHcSvTbwX8wLD9o and complete 1 Form for each Group Member write comments about the Group Members Contribution to Project Work Group 1 --------- Presentation Subject: MapReduce Types , Formats , and Features Chapter 8. from Book 2. HadoopTheDefinitiveGuide Chapter 9. from Book 2. HadoopTheDefinitiveGuide implement: Run the Example MapReduce Program as described in : 1. http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/02_Exercise_ExampleMapReduceProgram.txt 2. http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/ExampleMapReduce_ModifiedInstructions.docx 3. http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/InstructionsForDSBAHadoopCluster.txt 4. https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/02_ExampleMapReduceProgram_WithoutCloudera.txt 5. https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/02_ExampleMapReduceProgram_UsingAWS.txt Group 2 --------- Presentation Subject: Pig | Hive | HBase | Zookeeper Chapter 16. from Book 2. HadoopTheDefinitiveGuide Chapter 17. from Book 2. HadoopTheDefinitiveGuide Chapter 20. from Book 2. HadoopTheDefinitiveGuide implement: HIVE program as described in : http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/03_Exercise_Hive.txt Group 3 --------- Presentation Subject: Downloading Spark , Getting Started , Simple Spark Applications , Scala and Python Example Programs | Intro to Scala Chapter 5. from Book 3. LearningSpark Chapter 9. from Book 3. LearningSpark Chapter 11. from Book 3. LearningSpark implement: Spark SQL program as described in : http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/04_Exercise_SparkSQL.txt Group 4 ---------- Presentation Subject: Boolean Retrieval | Term Vocabulary and Posting Lists | Web Search Basics Chapter 1. from Book 4. InformationRetrieval Chapter 2. from Book 4. InformationRetrieval Chapter 19. from Book 4. Information Retrieval implement: PageRank program as described in : http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/05_Exercise_PageRank.txt Group 5 ---------- Presentation Subject: Frequent Itemsets , Market Basket , Association Rules , Apriori , Other Algorithms Read Chapter 6. from Book 1. MiningOfMassiveDatasets implement: AssociationRulesMining as described in : http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/AgrawalExample.doc //write a program , which implements the algorithm from this exercise . use the same data from the exercise as an input, and check your output to match the results of the exercise . user should be asked to provide minimum support treshold before program starts . http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/07_Exercise_Part2_SparkAssociationRules_AWS.txt Group 6 ---------- Presentation Subject: Recommender Systems 01 , Content Based , Collaborative Filtering Chapter 9. from Book 1. MiningOfMassiveDatasets implement: DecisionTreeSystemID3 as described in : http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/ID3Example.doc Example Code: http://webpages.uncc.edu/aatzache/ITCS6190/Project/DecisionTree/DecisionTree.zip http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/GroupActivity08_Spark_MLlib_AWS.txt Find instructions to setup the project in http://webpages.uncc.edu/aatzache/ITCS6190/Project/DecisionTree/README.txt //write a program , which implements the algorithm from this exercise . use the same data from the exercise as an input, and check your output to match the results of the exercise . user should be asked to provide minimum tree depth treshlod value before program starts . Group 7 ---------- Presentation Subject: Computational Advertising | Comparison between MapReduce and bulk-synchronous systems Chapter 8. from book 1. MiningOfMassiveDatasets implement: Graph Analysis in Spark GraphX as described in : http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/SparkGraphX/Exercise_SparkGraphX.txt Group 8 ---------- Presentation Subject: Support Vector Machine https://webpages.uncc.edu/aatzache/ITCS6190/Project/DM_04_4.9_Chap4_SVM.ppt implement: Support vectore Machine - Classification using the given data in the project folder. https://webpages.uncc.edu/aatzache/ITCS6190/Project/SVMInstructions.txt https://webpages.uncc.edu/aatzache/ITCS6190/Project/SVM.zip Group 9 ---------- Presentation Subject: Chapter 2 Data from Data Mining Book http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/chapter2_Data.ppt Implement Exercise 19 chapter 2 in Java and in Spark. Make sure your code produces the correct result as given in exercise 19 solution. Implementation Links: 1)Java Code http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/Exercise19_Chapter02_SimilarityUsingVectors_JAVA.zip 2)Scala Code http://webpages.uncc.edu/aatzache/ITCS6190/Exercises/Exercise19_Chapter02_SimilaritiesUsingVectors.zip The program calculates distance / similarity between animals . Shows how similar is one animal to another. Group 10 ---------- Presentation Subject: Decision Rules (LERS), Action Rule Discovery Read powerpoints on your research subject rough sets and Action Rules http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/LERS.doc http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/ActionRules_Simple.ppt http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/ActionRuleDiscoveryExample.doc Implement: LERS algorithm as described in: https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/08_Exercise_SparkLERS_AWS.txt and Action Rules as described in: https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/GroupActivity04_ActionRules_Part4_Spark_AWS.txt Group 11 ---------- Presentation Subject: Clustering http://webpages.uncc.edu/aatzache/ITCS6162/PowerPoints/KMeansExample.doc Chapter 7. from Book 1. MiningOfMassiveDatasets Implement: K-Means Clustering as described in https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/GroupActivity05_KMeansSparkMLib_AWS.txt Group 12 ---------- Presentation Subject: Classification http://webpages.uncc.edu/aatzache/ITCS6190/PowerPoints/IR/IR_13_NaiveBayesClassification_Intro.ppt http://webpages.uncc.edu/aatzache/ITCS6190/PowerPoints/IR/IR_13_TextClassification_NaiveBayes.pptx Chapter 13. from book 4. Information Retrieval Implement: Naive Bayes Classification in Spark Dataset: Car Evaluation and Mammographic dataset https://webpages.uncc.edu/aatzache/ITCS6190/Exercises/Group_act_12_Naive_Bayes.txt * Note: This is a Group Project . On Canvas locate your Group Members , and obtain their e-mails . This project requires that every student checks his/her UNCC e-mail account, and communicates with his / her group-mates . Contact your group-mates as soon as possible . Be sure to talk to them , meet with them , e-mail , telephone , Facebook or use any other means of communication you like . If a student is reported by his / her group-mates as non-responsive or not participating in the group activities , the student will receive a grade of 0 for this project . If a student is not present ( misses the class ) on the assigned presentation date , the student will receive a grade of 0 for this project .