Hadoop and MapReduce — Assignment : 5 Solution

$30.00 $24.00

Introduction to Hadoop and MapReduce For this assignment, you would be running a Hadoop Virtual Machine on your system and write code for the following problems. It will roughly take you 2 hours to code. Coding Language : Python Virtual Machine Setup : Downloading the VM Download it from http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip Warning the zipped file size…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Introduction to Hadoop and MapReduce

For this assignment, you would be running a Hadoop Virtual Machine on your system and write code for the following problems. It will roughly take you 2 hours to code.

Coding Language : Python

Virtual Machine Setup :

Downloading the VM

  1. Download it from

http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip Warning the zipped file size is 1.7 GB. If you are on a Windows machine you will likely need to use WinRAR to open this .zip file because other methods fail to open the unzipped file (which exceeds the maximum specified 4GB for a .zip file).

  1. MD5sum file can be found here

http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip .md5

  1. Unzip it. Warning the unzipped size is 4.2GB

  1. MD5 hashes for files:

    • 8a610c151d4b1ebdce11542d13dd2a53 Cloudera Training VM 4.1.1.c.log

    • 6b44c965c1c6062554bf4cc12d11e87e Cloudera Training VM 4.1.1.c.plist

    • 46dedeba3e0affd8311431d7e370705e Cloudera Training VM 4.1.1.c.vmdk

    • d41d8cd98f00b204e9800998ecf8427e Cloudera Training VM 4.1.1.c.vmsd

    • 096956c1cbabeaa652ca63a2d5e14612 Cloudera Training VM 4.1.1.c.vmx

    • c9f8a375e82ef1e9d96097850e237df9 Cloudera Training VM 4.1.1.c.vmxf

    • 0d7c8becb5a515068e81bb303c794e4f nvram

Using Oracle VirtualBox

  1. Download and install VirtualBox from https://www.virtualbox.org/wiki/Downloads

  1. Create a new Virtual machine:

a. Create a new virtual machine by pressing the ‘New’ button:

b. Choose a name, use ‘Type’: ‘Linux’:

c. Press Next

d. Select memory size for the VM.

e. Press Next

f. Select ‘Use an existing virtual hard drive file’’, click the button to browse to the directory you unzipped the provided VM image and press ‘Create’.

g. Start the VM!

Using VMWare

  1. Download and install from

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player /6_0

  1. Create the Virtual Machine:

a. Click on ‘Open a Virtual Machine’ and, when prompted, navigate to the folder you unzipped the VM, choose the file and click ‘Open’.

b. Select the machine and click ‘Play virtual machine’

Dataset Download :

Dataset for the problem is a dataset on Airports which can be downloaded from moodle.

Problem 1:

Write Mapper and Reducer to get the number of Airports by :

  1. Country

  1. Type

Problem 2:

Write Mapper and Reducer to find the

  1. Country

  1. Region

having the highest number of airports

NOTE:

For both the problems and each part, write separate Mappers and Reducers and don’t mix the problem.

Resources :

  1. Unit 2 and Unit 3 from this online course (~ 1 2 hours). Unit 1 and 4 are not needed. https://in.udacity.com/course/intro-to-hadoop-and-mapreduce–ud617

  2. Chapter 6 should suffice which is also free to download. http://go.cloudera.com/udacity-lesson-2

Deliverables/Upload Format :

RollNo / ProblemNumber

Mapper.py

Reducer.py

You can upload the code from the Virtual Machine itself.

Hadoop and MapReduce --- Assignment : 5 Solution
$30.00 $24.00