Cloud Computing for Data Analysis - Final Exam - Practice Questions 1 . What is Information Retrieval ? A. Finding documents of an unstructured nature (text) that satisfies an information need from within large collections B. It is an ad-hoc retrieval standard IR task, from collection, retrieves relevant documents based on arbitrary input C. Representing each document unweighted, and retrieving an unordered set of documents containing the query words D. It is a tokenization task of chopping a character sequence into little pieces (tokens), and assigning a class to all tokens containing same character sequence 2 . Which of the following steps are part of Building an Inverted Index for documents ? i. Collect the documents to be indexed ii. Tokenize the text, turning each document into a list of tokens iii. Do linguistic preprocessing, producing a list of normalized tokens (indexing terms) iv. Index the documents that each term occurs in by creating an inverted index consisting of a dictionary and postings. A . i . and iv . B . ii . , and iii . C . iii . and iv . D . i . , ii . , iii . , and iv . 3 . Which of the statements below is incorrect in relation to PageRank ? A . PageRank is a function that assigns a real number to each page in the Web B . The higher the PageRank of a page, the more “important” it is C . PageRank is used to simulate where Web surfers would tend to congregate if they followed randomly chosen outlines D . PageRank of a given page is easy to manipulate through a technique called link spam 4 . What is Data ? A . Collection of data objects and their attributes B . A collection of attributes describe an object C . Molecular structures D . Things known or assumed as facts, making the basis of reasoning 5 . Which of the following are examples of Nominal attribute type ? i . Eye color ii . ID numbers iii . Calendar dates iv . Temperature A . i . and ii . B . i . and iii . C . ii . and iv . D . iii . and iv . E . all of the above 6 . Which of the statements below is incorrect in relation to Association Rules ? A . Association Rules can be used for Market Basked Analysis B . Association Rules express how product/services relate to each other, and tend to group together C . Apriori (Agrawal) method uses Frequent Itemsets in order to discover Association Rules D . An itemset is considered frequent if it satisfies a minimum utility treshold 7 . In a set of transactions the frequent itemsets {a1, a4} support 3 and {a2, a3} support 3 were extracted . If the support of {a1} is 3, and the support of {a2} is 4 , which of the following are Association Rules that can be extracted from these frequent itemsets ? i . a1 -> a4 (3, 3/3 = 100%) ii . a4 -> a1 (3, 3/4 = 75%) iii . a4 -> a1 (3, 3/3 = 100%) iv . a2 -> a3 (3, 3/4 = 75%) v. a3 -> a2 (3, 3/4 = 75%) A . i . , ii . , and iv . B . i . and iv . C . ii . and iii . D . iii . and iv . E . i . , ii . , iii . , and iv . 8 . If we are looking for Association Rules AR(3, 70%) , which of the following rules satisfy the minimum support and confidence tresholds ? A . a2 -> a3 (3, 1/3 = 33%) ; a3 -> a2 (3, 3/3 = 100%) B . a1 -> a4 (3, 3/3 = 100%) ; a4 -> a1 (3, 3/4 = 75%) C . a2 -> a4 (1, 1/1 = 100%) ; a4 -> a2 (1, 1/1 = 100%) D . a4 -> a5 (3, 3/4 = 75%) ; a5 -> a4 (2, 2/3 = 66%) 9 . What are the steps to build a Frequent Pattern Tree ( FP-Tree ) ? i. Scan the transaction database and find frequent single item sets ii. Sort by support in ascending order iii. Create the root of the FP-Tree iv. For each transaction If the tree has a child which name corresponds to the name of the current item, increment its count v. Else, create a new node and let its count be 1, and create a link to it from its parent. A. i. and ii. B. i. , ii. , iii. C. ii. and iv. D. i. , iii. , iv. E. i. , iii. , iv. , v. F. all of the above