Cloud Computing for Data Analysis - Midterm Exam - Practice Questions 1 . In relation to the MapReduce environment , files are stored in a Distributed File System . Which of the following statements is incorrect in relation to the Distributed File System ? A. File is split into chunks usually 64MB in size B. Each file chunk is replicated 3 times C. The files are stored on a cluster of high performance data center machines D. The machines on the cluster are connected via Ethernet 2 . Which of the following statements are example tasks for which MapReduce is suitable ? i . ranking of Web pages by importance ii . searches in friends networks at social networking sites iii . counting sales and grouping them by City or Region iv . counting words in a text file A . i . and iv . B . ii . , and iii . C . iii . and iv . D . i . , ii . , iii . , and iv . 3 . What is the purpose of the Reduce Tasks in MapReduce environment ? A . To reduce the logic of the Map function B . Each Reducer produces some output , and the entire job is the union of what is produced by each Reducer C . To always Group or Aggregate the results from the Map function D . The Reduce Tasks are not really needed , and can be omitted 4 . What is the main need to have Hadoop ? A . Need to process huge datasets on large clusters of computers B . Need to build reliability in each application C . Need for common open source platform D . Need for replication of nodes 5 . Which of the following statements are correct in Relation to Hadoop MapReduce , compared to traditional Relational Database Management System ( RDBMS ) ? i . Hadoop MapReduce can handle much larger data size than traditional RDBMS ii . Hadoop MapReduce provides both ‘ Interactive and Batch ‘ processing access iii . traditional RDBMS provides updates of type ‘ read and write many times ‘ , while Hadoop MapReduce provides ‘ write once read many times ‘ iv . traditional RDBMS has ‘ low ‘ integrity , while Hadoop MapReduce has ‘ high ‘ integrity A . i . and ii . B . i . and iii . C . ii . and iv . D . iii . and iv . E . all of the above 6 . Which of the following statements is incorrect in relation to MapReduce ? A . MapReduce is a programming model for efficient distributed computing B . MapReduce allows for improved load balancing and faster recovery from failed tasks C . The input to a Mapper is a ( key , value ) pair D . The number of Mappers , which are launched by Hadoop for a given job , is specified by the user 7 . What are the main goals of Hadoop Distributed File System ( HDFS ) ? i . can be built out of commodity hardware ii . very large distributed file system suitable for applications with large datasets iii . optimized for streaming data and iterative algorithm applications iv . highly fault tolerant A . i . , ii . , and iv . B . i . and iv . C . ii . and iii . D . iii . and iv . E . i . , ii . , iii . , and iv . 8 . Hadoop Distributed File System ( HDFS ) stores large files across machines on a cluster . To do that , it uses the concept of a block . Which of the statements below is incorrect in relation to the HDFS block ? A . The size of the HDFS block is much larger than the size of regular operating system block B . A file larger than a single block is broken into multiple blocks , and each block can be stored on any of the disks on a cluster C . The file metadata such as file name , owner of file , last modified date , are stored with the blocks D . To provide fault tolerance and availability , and insure against corrupted disks and machine failure , each block is replicated to a number of physically separate machines ( usually 3 ) 9 . What is the primary purpose of the NameNode in Hadoop Distributed File System ( HDFS ) ? A . To serve as a replication engine for HDFS blocks B . To store and retrieve data blocks when requested by clients C . To keep a copy of the merged namespace image and EditLog in case of failure D . To manage the HDFS Namespace and Metadata , and keep an image of the entire file system namespace and map of blocks in memory