Name: Data Mining Assignment 4: CLV and Churn Modeling Solution
SKU: 5554
Price: 30.00 USD
Availability: InStock

Description

Rate this product

The data set np.csv is space delimited with a header line and the value \.” indicates missing. In R you will want to set na.strings=”.”. It has been set up to run discrete time survival models with one record for each customer decision. You have a sample of digital-only subscribers without any left censoring. SubscriptionId uniquely identi es a subscriber and t is the month number in the customer’s life. You have the following variables

churn: indicator if customer churned this month Overall reader engagement variables

{ regularity: number of reading days this month

{ intensity: number of page views (PVs) per reading day this month

Payment variables trial, currprice: indicates if the reader is paying a trial rate and the price paid this period.

Content variables sports1{opinion1: number of PVs in each section this month

Location variables Loc1{Loc4: number sessions in four di erent locations this month. Remaining PVs are from other locations.

Source variables SrcGoogle{SrcLegacy: number of sessions from di erent refer-ring sources this month

Device variables mobile, tablet, desktop: number of sessions on di erent de-vices this month

The purpose of this exercise is to do an exploratory analysis to understand what factors are associated with churn/retention. Insights from this analysis will be used to allocate resources to improving aspects of the product. I think of regularity and intensity as measures of reader engagement, the content variables are about the product, device variables tell us about distribution and the user experience, source variables tell us about promotion and acquisition, and location might help us in targeting acquisition e orts and deciding where to assign reporters.

For all parts use logistic regression. To avoid issues of simultaneity, predict churn next month from reading behaviors this month. Create a variable nextchurn indicating churn next month by customer. Hint: see here for help using the dplyr commands lead and group by. Also create a lead version of currprice and call it nextprice. Submit your R code.

Make t a factor variable so that you don’t have to use factor in every model below. Submit a table.

1. Run the following models:

nextchurn ~ t+trial+nextprice+regularity+intensity nextchurn ~ t+trial+nextprice+regularity nextchurn ~ t+trial+nextprice+intensity

What do you conclude about the e ects of trial, price, regularity and intensity. Note that it’s always a good idea to examine diagnostics like correlations and VIFs. What is the trial e ect telling you, given that (1) most trial o ers are 1 month, (2) many customers did not have trial o ers, and (3) you already have a dummy for month 1 in the model with the t variable?

1. Fit the following model to study content: nextchurn~t+trial+nextprice+sports1+news1+crime1+life1+obits1+business1+opinion1

We need to be careful about multicollinearity. Do your conclusions change if you include regularity in the model?

1. What can you conclude about the e ect of location on churn? Fit these models:

nextchurn~t+trial+nextprice+loc1+loc2+loc3+loc4

nextchurn~t+trial+nextprice+regularity+loc1+loc2+loc3+loc4

1. What can you conclude about the e ect of source on churn?

1. What can you conclude about the e ect of device on churn?

1. Do your conclusions change if you t a model with payment, content, location, source and device variables all in at the same time? What if you use lasso with cross validation rather than statistical signi cance?

1. Considering all of your analyses, put the variables into the following categories:

No association with churn

Strong drivers of churn (do less of these things)

Strong drivers of retention (do more of these things) Questionable drivers of churn

Questionable drivers of retention

Consider a migration model with k states and transition matrix P(k k). Suppose that at some initial point in time there are n_0i customers in state i = 1; : : : ; k. Let n_t = (n_t1; : : : ; n_tk)^T be the number of customers in each of the k states at time t. Suppose that during each period, a(k 1) new customers are acquired, e.g., if state 1 is for new customers then a = (a₁; 0; : : : ; 0)^T. Thus, the number of customers in each state at time t + 1 is

n_t^T₊₁ = n_t⁰P + a⁰;	t = 0; 1; 2; : : :	(1)
Show that the expected number of customers at time t equals
n_t⁰ = n₀⁰P^t + a⁰(I	P^t)(I P) ¹:	(2)

A news site has eight segments (states). There are four life stages: registered user (prospect), trial subscriber, full-price subscriber, and churned. The life stages are crossed with two levels of the regularity of reading (number of days per month), low and high. For all parts, assume a monthly discount rate of d = 1%. The transition matrix, average number of page views per month, and the starting customer counts are as follows (available in csv le on Canvas):

Period t				Period t + 1

Lifestage	Prospect		Trial		Full		Churn		Page
Regularity	Low	High	Low	High	Low	High	Low	High	views

Pros L	603	76	83	88	50	39	8	1	7.7
Pros H	146	534	15	132	7	41	1	3	380.3
Trial L	0	0	45	17	309	147	26	7	13.4
Trial H	0	0	9	14	134	691	12	32	278.5
Full L	0	0	2	0	4310	614	223	19	4.2
Full H	0	0	0	7	955	4150	52	118	250.7
Churn L	0	0	9	3	18	4	1296	81	2.1
Churn H	0	0	4	6	1	6	154	424	163.6

Starting	5,000	5,000	1,000	1,000	3,000	3,000	4,000	4,000

Compute the transition probability matrix P.

The value vector has two components, subscription fees and advertising. Assume that the trial rate is $1/month, full-rate $10/month, and that there is no sub-scription revenue from prospects or churns. Suppose that the paper makes $0.002 for each page view (PV). Find the value vector.

Find CE, letting t go to in nity.

If you cut the number of advertisements in half, so that the ad revenue is $0.001/PV, by how much does CE change?

For the remaining parts assume that the news organization is only interested in projecting cash ows for the next 36 months (rather than to in nity). Find CE assuming ad revenue of 0.001/PV. Hint: use Equation (1) to nd n_t+1 from n_t and a, multiply n_t by the value vector, and add them up. You could do this with a loop in R or Python, but it might be helpful to try it in Excel the rst time using the mmult function.

Now suppose that you acquire 200 new prospects each month, 100 with low reg-ularity and 100 with high regularity. What is 36-month CE and how many total subscribers (trial plus full) do you expect to have? By how much did CE increase from the previous part?

Reducing the the ads should increase retention rates. Perhaps you have also started a newsletter to stimulate regularity, which will also increase retention rates. You would have to do a test to know exactly how much the probabilities

change, but for this exercises reduce the following transition probabilities by 1%: p_1:2;1:2 and p_3:6;7:8. Also increase the following by 1%: p_1:2;3:4 and p_3:6;5:6, What is 36-month CE and how many total subscribers (trial plus full) do you expect to have? By how much did CE increase from the previous part?

For you to think about but not turn in: what would the news site have to do to get to 30,000 subscribers and RE=$15 million? Where is there the most sensitivity? Should they invest in reactivating churned customers?

Data Mining Assignment 4: CLV and Churn Modeling Solution

Share this:

Share this:

Description

Share this:

Related products

Programming II Assignment 3: Calculator Solution

Lab 4: Implementing Diffuse Shading Solution

Lab 2: Ray tracing a Sphere Solution

Lab 2 File Management System Calls Solution

Assignment_4 Solution