Week 3 – Decision Tree Classifier

$24.99 $18.99

Decision Trees are one of the easiest and popular classification algorithms to understand and interpret. The goal of using a Decision Tree is to create a training model that can be used to predict the class or value of the target variable by learning simple decision rules inferred from prior data. The primary challenge in…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

Decision Trees are one of the easiest and popular classification algorithms to understand and interpret. The goal of using a Decision Tree is to create a training model that can be used to predict the class or value of the target variable by learning simple decision rules inferred from prior data.

The primary challenge in the decision tree implementation is to identify which attributes to consider as the root node at each level. Handling this is known as attribute selection. The ID3 algorithm builds decision trees using a top-down greedy search approach through the

space of possible branches with no backtracking. It always makes the choice that seems to be the best at that moment.

Attribute selection in the ID3 algorithm involves various steps such as computing entropy, information gain and selecting the most appropriate attribute as the root node.

In this assignment you are are required to prepare a module that will help any machine learning fresher to use to calculate these heuristic on any categorical attributed data

Your task is to complete the code for the function whose skeleton is predefined.

You are provided with the following files:

  1. week3.py

  1. SampleTest.py

Note: These sample test cases are just for your reference.

SAMPLE TEST CASE Data

Note: This is the same dataset that was used in class for ID3 Algorithm.

Important Points:

  1. Please do not make changes to the function definitions that are provided to you. Use the skeleton as it has been given. Also do not make changes to the sample test file provided to you. Run it as it is.

  1. You are free to write any helper functions that can be called in any of these predefined functions given to you.

  1. Your code will be auto evaluated by our testing script and our dataset and test cases will not be revealed. Please ensure you take care of all edge cases!

  1. Do not assume the schema of the hidden test case to be the same as sample test case, although the values will be categorical but the target variable may vary and the number of classes may not be fixed

  1. Point number 4 stresses on the fact that you should not hard code anything. Remember you are designing a module to help for ID3 algorithm of any kind of data.

  1. The dataset we will be testing against will have N columns with N-1 attributes and Nth column being the target value.

  1. Ensure you follow convention while returning from these functions.

  1. The experiment is subjected to zero tolerance for plagiarism. Your code will be tested for plagiarism against every code of all the sections and if found plagiarized both the receiver and provider will get zero marks without any scope for explanation.

  1. Kindly do not change variables name or other technique to escape from plagiarism, as the plagiarism checker is able to catch such plagiarism.

  1. Hidden test cases will not be revealed post evaluation.

week3 .py

Contains four function

  1. get_entropy_of_dataset

  2. get_avg_info_of_attribute

  1. get_information_gain

  2. get_selected_attribute

Function Name

Input Parameter

Return Value

get_entropy_of_dataset

df : pandas dataframe of

the

entropy : Entropy of the entire

given dataset

dataset (int/float)

get_avg_info_of_attribute

df : pandas dataframe

of

the

Avg_info : Average Information

given dataset

of that attribute (int/float)

attribute: name of the attribute

(column name) whose Avg info

is to be found Ex: “Temperature”

get_information_gain

df : pandas dataframe

of

the

Information_gain : Information

given dataset

gain of that attribute (int/float)

attribute: name of the attribute

(column name) whose info gain

is to be found Ex: “Temperature”

get_selected_attribute

df : pandas dataframe of

the

tuple((information_gains,

given dataset

selected_column)

Information_gains:

python

dictionary with key as column

name and value as its information

gain

Selected column : string,basically

the selected column name for split

Example:

({‘A’:0.123,’B’:0.768,’C’:1.23}

,

‘C’)

  1. You may write your own helper function if needed

  2. You can import libraries that come built-in with python 3.7

  3. You cannot change the skeleton of the code

SampleTest.py

  1. This will help you check your code.

  1. Note that the test case used in this is same as the graph defined above

  1. Passing the cases in this does not ensure full marks ,you will need to take care of edge cases

  1. Name your code file as YOUR_SRN.py

  1. Run the commandpython3 SampleTest.py –SRN YOUR_SRN (incase of any import error use the below command)

python3.7 SampleTest.py –SRN YOUR_SRN

Week 3 - Decision Tree Classifier
$24.99 $18.99