Homework 2 Solution

$30.00 $24.00

Submit one homework per group. Put all names on the homework. This data uses data from tradeshow.csv. You have the following variables: Buy. Evaluate and compare speci c equipment for purchase, place orders, nd new suppliers and solutions. Social. Spend time with others, network with colleagues, extend professional network. Education. Keep up-to-date on industry trends,…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:
Tags:

Description

5/5 – (2 votes)

Submit one homework per group. Put all names on the homework.

  1. This data uses data from tradeshow.csv. You have the following variables:

Buy. Evaluate and compare speci c equipment for purchase, place orders, nd new suppliers and solutions.

Social. Spend time with others, network with colleagues, extend professional network. Education. Keep up-to-date on industry trends, attend continuing education sessions,

attend keynotes.

    1. Estimate K-means on the three variables and nd the 3-cluster solution. Give the cluster sizes, means and RMSE values. Describe each of the three clusters

    1. Estimate a Gaussian mixture using the three variables in R with the options G=3 for three clusters and modelNames=”VII” for unequal variance, round clusters (spherical in Python). Submit a classi cation plot. Compare the solution to K-means.

      1. Do the cluster means tell the same story, or are there di erences?

      1. Comment on the K-means vs. GMM cluster sizes.

      1. Comment on the within cluster standard deviations (vs. RMSE for K-means).

      1. How many variance parameters are estimated in total?

    1. Estimate Gaussian mixtures using three variables only with the G=3 option (use tied in Python).

      1. Do the cluster means tell the same story, or are there di erences?

      1. Generate a classi cation plot.

      1. Which variance model did Mclust pick (it should be EEE)? Describe in words the shape of the class-conditional distributions.

      1. How many variance parameters are estimated in total?

    1. Which of the three solutions do you prefer?

  1. This problem uses a data set from the Nuoqi retailer in China. You have ve factors measuring attitudes toward fashion: Cross, fashion enthusiast, functional, impressive, self-expression. See the Powerpoint for the actual questions that were asked of consumers the alpha values.

    1. Use K-means to nd the ve-cluster solution using the rst ve variables in the data frame. Give the usual sizes, means, and RMSE values. Comment on the solution.

    1. Suppose that there are individual di erences in the way that di erent respondents use the sales, where some are systematically more positive and others are more negative. The variable xbar is the average response for the given respondent to all 5-point scales on the survey. Compute ve new variables equal to the original variable minus xbar, e.g., nuoqi$impressI = nuoqi$impress-nuoqi$xbar. This is called ipsatization, and it will be important to us with recommender systems.

1

      1. Use K-means to nd the ve-cluster solution using the ipsatized versions of the rst ve variables in the data frame. Give the usual sizes, means, and RMSE values. Comment on the solution. Is there improvement?

      1. Run K-means solutions for the K =2{6 solutions and examine the t statistics (SSE, R-Squared, Pseudo F). Try both the raw and ipsatized data. Which do you suggest?

      1. Try Gaussian mixture models and look at the plots. What is the underlying problem when trying to cluster this data set?

  1. Write a function to generate data for this problem with parameter . There are K = 2 equal-sized clusters with one cluster sampled from N( ; 2) and the other from N( ; 2), where 2 = 1. Assume n1 = n2 = 3000 observations from each (Mclus will start to have problems for larger n, but Python should be able to handle somewhat larger sample sizes, and K-means can easily handle much larger n). Estimate GMM and K-means models for = 0:5; 1 and 2. Report the estimated means and variances. Discuss the results, especially the biases discussed in class and how they are a ected by the separation of the means.

2

Homework 2 Solution
$30.00 $24.00