Cloud Computing for Data Analysis - Final Exam - Practice Questions - Answer Key 1 . What is Information Retrieval ? A. Finding documents of an unstructured nature (text) that satisfies an information need from within large collections related to slides 1 - 6 in PowerPoint S_Boolean_Retrieval.pptx 2 . Which of the following statements are example tasks for which MapReduce is suitable ? i . ranking of Web pages by importance ii . searches in friends networks at social networking sites iii . counting sales and grouping them by City or Region iv . counting words in a text file D . i . , ii . , iii . , and iv . related to slide 4 in PowerPoint S_Boolean_Retrieval.pptx 3 . Which of the statements below is incorrect in relation to PageRank ? D . PageRank of a given page is easy to manipulate through a technique called link spam related to pages 163 - 166 in Chapter 5. from Book 1. MiningOfMassiveDatasets 4 . What is Data ? A . Collection of data objects and their attributes related to slides 2 -3 in PowerPoint chapter2_Data.ppt 5 . Which of the following are examples of a Nominal Attribute type ? i . Eye color ii . ID numbers iii . Calendar dates iv . Temperature A . i . and ii . related to slide 7 in in PowerPoint chapter2_Data.ppt 6 . Which of the statements below is incorrect in relation to Association Rules ? D . An itemset is considered frequent if it satisfies a minimum utility treshold related to slides 2, 5, 6, 7 in PowerPoint AssociationRules_Agrawal.pptx 7 . In a set of transactions the frequent itemsets {a1, a4} support 3 and {a2, a3} support 3 were extracted . If the support of {a1} is 3, and the support of {a2} is 4 , which of the following are Association Rules that can be extracted from these frequent itemsets ? i . a1 -> a4 (3, 3/3 = 100%) ii . a4 -> a1 (3, 3/4 = 75%) iii . a4 -> a1 (3, 3/3 = 100%) iv . a2 -> a3 (3, 3/4 = 75%) v. a3 -> a2 (3, 3/4 = 75%) A . i . , ii . , and iv . related to pages 7 - 9 in WordDocument AgrawalExample.doc 8 . If we are looking for Association Rules AR(3, 70%) , which of the following rules satisfy the minimum support and confidence tresholds ? B . a1 -> a4 (3, 3/3 = 100%) ; a4 -> a1 (3, 3/4 = 75%) related to page 9 in WordDocument AgrawalExample.doc 9 . What are the steps to build a Frequent Pattern Tree ( FP-Tree ) ? i. Scan the transaction database and find frequent single item sets ii. Sort by support in ascending order iii. Create the root of the FP-Tree iv. For each transaction If the tree has a child which name corresponds to the name of the current item, increment its count v. Else, create a new node and let its count be 1, and create a link to it from its parent E. i. , iii. , iv. , v. related to pages 1, 2 in WordDocument FrequentPatternGrowth_FPTree.doc