Work with your team. Submit one assignment per team, and make sure that all team members have their name on the submitted assignment and give the name of your team, e.g., \Team A." The homework is due at the beginning of class (1pm). MSO problem The purpose of this homework is to nd good segments…

Work with your team. Submit one assignment per team, and make sure that all team members have their name on the submitted assignment and give the name of your team, e.g., \Team A.” The homework is due at the beginning of class (1pm).

  • MSO problem

The purpose of this homework is to nd good segments using K-means. The le cable2.csv has data from a cable TV multi-system operator (MSO) that o ers subscription packages of three types of products: video (e.g., cable TV), internet data and landline phone service. The purpose of the cluster analysis is to identify distinct segments of subscribers so that the MSO can send di erent o ers to the segments and increase their value. For example, the MSO could o er a customer who has low-speed internet an o er for high-speed internet, which would increase the cash ows each month. The clusters must therefore lead to creating an o er, which will then be sent to members of the cluster.

The unit of analysis in the data set is a subscriber household. You have 10,000 households. For this assignment you will cluster on three variables, video, internet and phone. You can assume that those paying more for video have more channels and services. Those paying more for internet have faster speeds. These three variables give the amount paid by the customer in the most recent month for each of the three services. You also have some demographic variables from a third-party data provider: age of head of household, household income, household size, marital status, and number of children.

  1. Run relevant descriptive statistics on each of the three clustering vari-ables. Discuss aspects that will be relevant in nding the clusters.

  1. Discuss the implications of standardizing the payment variables before the analysis. Do you suggest standardizing? If so, how? If not, why not?

  1. Find the 6{10 cluster solutions using K-means. Which do you sug-gest? Why? You may want to consider pseudo F statistics, etc., but the actionability should also be considered given the purpose of the analysis.


  1. Pro le the clusters from your best cluster solution on the demographic variables to get a better understanding of the clusters. Where are there large di erences? Can you get a sense of the typical person in each cluster?

  1. What variables would you like to have? These could be added to the current set of variables in the cluster analysis, or clustered separately for further personalization. Think about what the MSO can observe. For example, it can know the amount of data that each household uploads and downloads each month, the number of phone calls made, the programs watched, and additional services (video on demand, pay-per-views, etc.). What exactly would you do with the variables?

  • News Website Problem

(data to come) For this problem you will cluster data from a news website with subscribers. The purpose of the clusters is to develop a \lightly person-alized” email newsletter to be sent each day. We know from other analyses that the more often someone reads at least some content, the less likely they are to churn. The purpose of the email is to get relevant stories in front of subscribers so that they will be more likely to nd something that interests them. We have one year of clickstream data and about 3000 subscribers who joined, and possibly also canceled, their subscriptions during the year. For each day that the subscriber reads I will give you a count of page views by \section,” e.g., home page, life and culture, crime, news, obituaries, opinion, sports.

We are looking for 5{10 clusters. Each cluster will receive its on newslet-ter each day featuring stories that might be of interest to the reader.

  1. Discuss issues in nding the clusters. Which variables do you use? Should you combine certain variables? Do you need to \normalize” the variables in any way? Other transformations?

  1. Find the best cluster solution.


