Cloud Computing for Data Analysis - Midterm Exam - Practice Questions - Answer Key 1 . In relation to the MapReduce environment , files are stored in a Distributed File System . Which of the following statements is incorrect in relation to the Distributed File System ? C. The files are stored on a cluster of high performance data center machines related to slides 11 - 18 in PowerPoint MMS_01_Intro_MapReduce_Chapter01.pptx 2 . Which of the following statements are example tasks for which MapReduce is suitable ? i . ranking of Web pages by importance ii . searches in friends networks at social networking sites iii . counting sales and grouping them by City or Region iv . counting words in a text file D . i . , ii . , iii . , and iv . related to page 21 in Chapter 2. from Book 1. MiningOfMassiveDatasets , and VideoCase01 3 . What is the purpose of the Reduce Tasks in MapReduce environment ? B . Each Reducer produces some output , and the entire job is the union of what is produced by each Reducer related to slide 21 in PowerPoint MMS_01_Intro_MapReduce_Chapter01.pptx 4 . What is the main need to have Hadoop ? A . Need to process huge datasets on large clusters of computers related to slide 4 in PowerPoint CloudToolsOverview.ppt 5 . Which of the following statements are correct in Relation to Hadoop MapReduce , compared to traditional Relational Database Management System ( RDBMS ) ? i . Hadoop MapReduce can handle much larger data size than traditional RDBMS ii . Hadoop MapReduce provides both ‘ Interactive and Batch ‘ processing access iii . traditional RDBMS provides updates of type ‘ read and write many times ‘ , while Hadoop MapReduce provides ‘ write once read many times ‘ iv . traditional RDBMS has ‘ low ‘ integrity , while Hadoop MapReduce has ‘ high ‘ integrity B . i . and iii . related to page 9 in Chapter 1. from Book 2. HadoopTheDefinitiveGuide 6 . Which of the following statements is incorrect in relation to MapReduce ? D . The number of Mappers , which are launched by Hadoop for a given job , is specified by the user related to slides 24, 26, 27 , and 35 in PowerPoint CloudToolsOverview.ppt 7 . What are the main goals of Hadoop Distributed File System ( HDFS ) ? i . can be built out of commodity hardware ii . very large distributed file system suitable for applications with large datasets iii . optimized for streaming data and iterative algorithm applications iv . highly fault tolerant A . i . , ii . , and iv . related to slide 8 , in PowerPoint CloudToolsOverview.ppt , and slide 3 , in PowerPoint HDFS.ppt 8 . Hadoop Distributed File System ( HDFS ) stores large files across machines on a cluster . To do that , it uses the concept of a block . Which of the statements below is incorrect in relation to the HDFS block ? C . The file metadata such as file name , owner of file , last modified date , are stored with the blocks related to pages 45 , 46 in Chapter 3. from Book 2. HadoopTheDefinitiveGuide 9 . What is the primary purpose of the NameNode in Hadoop Distributed File System ( HDFS ) ? D . To manage the HDFS Namespace and Metadata , and keep an image of the entire file system namespace and map of blocks in memory related to pages 46 , 47 in Chapter 3. from Book 2. HadoopTheDefinitiveGuide and slides 11 , 12 in PowerPoint CloudToolsOverview.ppt , and slides 15 , 16 in PowerPoint HDFS.ppt