Name: CHE1147H Programming assignment #5
SKU: 21334
Price: 24.99 USD
Availability: InStock

Description

5/5 – (2 votes)

Supervised learning

Here, you are going to use the features you generated in Assignment #3 to predict the clients response to a promotion campaign. This is a typical classi cation problem in the retail industry, but the formulation of the problem is similar to industries such as fraud detection, marketing and manufacturing.

The clients responses are stored in the Retail Data Response.csv le from Kaggle. The responses are binary: 0 for clients who responded negatively to the promotional campaign and 1 for clients who responded positively to the campaign.

You will explore solving the classi cation problem with two di erent sets of features (i.e.

annual and monthly) and three di erent algorithms as shown in the image below.

Retail response

classi cation problem

Annual features			Monthly features
Logistic			Logistic
Regression	Decision	Random	Regression	Decision	Random
with L1	Tree	Forests	with L1	Tree	Forests
regularization			regularization

1.1 Import the monthly and annual data and join

In Assignment #3, you created ve di erent feature families that capture annual and monthly aggregations. Here, you will model the retail problem with two approaches: using annual and monthly features. Therefore, you need to create the joined tables based on the following logic:

Table

annual

features

outputs

monthly

features

outputs

annual

features.xlsx

mth

rolling

features.xlsx

annual

day

week

counts

pivot.xlsx

mth

day

counts.xlsx

days

since

last

txn.xlsx

Retail

Data

Response.csv

Retail

Data

Response.csv

In both the annual and monthly features approach, you need to join at the end with table #4, the clients responses. This is simply a table that contains the binary response of the client to our marketing e ort as described above and that is the output or label or target that makes this a supervised learning problem.

1.2 Steps for each method (10 points)

Separate the inputs X and the output y in two data frames.

Split the data in train and test set. Use a test size value of 2/3 and set the random state equal to 1147 for consistency (i.e. the course code value). Use the following names for consistency.

Annual

train

annual

train

annual

test

annual

test

annual

Monthly

train

monthly

train

monthly

test

monthly

test

monthly

Pre-process (if necessary for the method).

Fit the training dataset and optimize the hyperparameters of the method.

Plot coe cient values or feature importance.

Plot probability distribution for test set.

Plot confusion matrix and ROC curves of train/test set. Calculate precision/recall.

Plot decision boundary for top 2 features.

1.3 Comparison of methods (10 points)

Compare the two feature engineering (annual and monthly) and the three modeling ap-proaches (L1 log-reg, tree, forests) in terms of the outcomes of steps 5-8. Which combina-tion of feature engineering and modeling approach do you select as the best to deploy in a production environment and why? Tabularize your ndings in steps 5-8 to summarize the results and support your decision (how to organize information with tables in Markdown).

CHE1147H Programming assignment #5

Share this:

Share this:

Description

Share this:

Related products

ASSIGNMENT 03

Examining the Effect of Cache Parameters and Program Factors on Cache Hit Rate Solution

Homework 6 Mountain Paths – Part II Solution

Assignment-7 Binary Search Trees II:Solution

Lab 4: Bash Script and Bitwise Operations Solution