CSC 4760/6760 DSCI 4760 Big Data Programming Assignment 4

$30.00 $24.00

Problem 1. (100 points) On Spark ML – Please use the provided Decision Tree Machine Learning Algorithm to predict the Test Accuracy on the provided dataset. About the dataset: Iris.csv : the dataset classifies flower species based on their sepal and petal length. There are three classification labels (setosa, versicolor, and virginica) Report: Implementation: Implement…

Rate this product

You’ll get a: zip file solution

 

Description

Rate this product

Problem 1. (100 points)

On Spark ML – Please use the provided Decision Tree Machine Learning Algorithm to predict the Test Accuracy on the provided dataset.

About the dataset: Iris.csv : the dataset classifies flower species based on their sepal and petal length. There are three classification labels (setosa, versicolor, and virginica)

Report:

Implementation:

Implement a PySpark program to solve the problem. We have provided an almost complete python code. You must implement the code in a JuPyter Notebook.

almost complete” phrasing is important here – some Hints:

  1. The python code provided does not include the required statements on the findspark package needed on your JuPyter Notebook cell.

  1. The python code, although correct, may or may not be ready to use as is (Python is strict with indentation issues, for example). It is strictly part of your assignment to verify.

Report and Submission Materials :

Please write a report illustrating your steps of execution. Uploading screenshots to iCollege is NOT writing a report.

In the report, you should include the answers to the following questions.

  1. Your Explanation of the provided source code, in addition to the comments provided -with screenshots

    1. If required – how you clean your data.

    1. How you encode your data.

    1. What classification label to choose in the Decision Tree classifier algorithm.

    1. What the accuracy of the algorithm’s prediction is.

Submission Materials:

  1. Your report in PDF containing the screenshots of the outputs

  1. Source code (JuPyter Notebook)

CSC 4760/6760 DSCI 4760 Big Data Programming Assignment 4
$30.00 $24.00