CS Homework 3 MapReduce Solution

$30.00 $24.00

This homework is a MapReduce programming assignment which need to complete inde-pendently on Amazon Web Services (AWS). The program should be developed in Python 3.6+ with the module mrjob 1. Although you may write and debug your program on a local machine, your final solution should run in the cloud using Amazon’s Elastic MapReduce (EMR).…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

This homework is a MapReduce programming assignment which need to complete inde-pendently on Amazon Web Services (AWS). The program should be developed in Python 3.6+ with the module mrjob 1. Although you may write and debug your program on a local machine, your final solution should run in the cloud using Amazon’s Elastic MapReduce (EMR).

Please submit the following files in one zip package through Blackboard, Homework 3 by 11:59:59 p.m., Friday, 17 May 2019 (7th Friday):

  • a Jupyter Notebook (.ipynb) which contains your main program and gives your answer to the question asked in the problem description,

  • other Python source code files (.py) needed for the execution of your main program,

  • the configuration file mrjob.conf with your AWS and SSH credentials removed,

  • a JPEG format screen-shot image (.jpg) of your Amazon EMR clusters console that shows your program’s “COMPLETED” state as well as the elapsed time, and also your AWS account name at the top-right corner, and

  • a plain text document (.txt) that reports how much time your program took to run on EMR with how many map nodes & reduce nodes, and also roughly how much time you spent working on this problem [for statistical purpose only, not for assessment].

Write a MapReduce program to calculate the conditional probability that a word w occurs immediately after another word w, i.e.,

P r[w|w] = count(w, w)/count(w)

for each and every two-word-sequence, i.e., bigram, (w, w) in the entire collection of over 200,000 short jokes (from Kaggle).

https://www.kaggle.com/abhinavmoudgil95/short-jokes

You program should ignore non-alphabetical characters and be case-insensitive when ex-tracting bigrams from text.

  1. http://mrjob.readthedocs.org/en/stable/

1

Which 10 words are most likely to be said immediately after the word “my”, i.e., with the highest conditional probability P r[w|w = my]?

Please list them in descending order.

  1. If you implement either the “pairs” pattern or the “stripes” pattern correctly, you can get up to 80% of your grade.

  1. If you implement both the “pairs” pattern and the “stripes” pattern correctly, you can get up to 100% of your grade..

2

CS Homework 3 MapReduce Solution
$30.00 $24.00