Defining Targeting Models
-
Customer targeting typically involves three models:
- Propensity model
- Time-to-event model
- Lifetime value model
- These models can be used individually or together
- A propensity model estimates the probability of a customer doing some event
- A time-to-event model estimates the number of days until a customer does some event
- A lifetime value model estimates the value of a customer
-
The events in a propensity or time-to-event model include:
- A candidate responding to an email campaign
- A customer puchasing a specific product
- A customer expanding to a new product
- A customer purchasing additional units of a prduct
- A customer changing shopping habits
- A customer churning
- All three of these models can help determine the impact of this event
Motivating Time-to-Event Analysis
- Propensity models estimate the probability of outcomes of marketing actions
-
The shortcomings of propensity models include:
-
Can't translate to time-until-event
- This is known as survival time
- This is usually more convenient and efficient for interpretation
-
Sometimes difficult to create response labels in the training set
- This is known as censored data
- E.g. easy to mislabel labels for churn
-
- Specifically, time-to-event can produce more actionable insights
- For example, a propensity model can estimate the conditional probability of a purchase by a customer given a discount is
-
Whereas, a time-to-event model can estimate:
- A customer is likely to make a purchase in days
- This time can be reduced by days by offering a discount of
- In this example, the time-to-event model may be more useful to us
Describing the Problems of Censored Data
- Censored data refers to data that is defined by a lack of an event, causing it to have an unknown event time
- For example, suppose we're interested in modeling churn propensity
- We're able to determine customers who have purchased
-
However, censored data occurs for data when it is difficult to distinguish between:
- Customers who haven't purchased
- Customers who haven't purchased yet
- Consequently, one can argue that labeling customers as churned or not-churned is not really valid
- Suggesting, it isn't accurate to use classification models with a binary outcome determined on the basis of currently observed outcomes
- As a result, we can use a different framework like survival models
Defining Terminology for Time-to-Event Models
- The above limitations of propensity modeling can be addressed by using survival analysis
-
Survival models can properly handle:
- Censored data (i.e records with potentially unknown outcomes)
- Predicting the expected time-to-event (or survival time)
- Specifying how marketing actions and customer properties can accelerate or decelerate an event
-
The main goal of survival analysis is to do the following:
- Predict the time to an event of interest
- Quantitatively explain how this time depends on the properties of the treatment, individuals, and other independent variables
-
A treatment (i.e. business decision) is typically an incentive or trigger
- E.g. a promotion
-
An event (i.e. customer action) is typically one of the following:
- A purchase
- A promotion redemption
- A subscription cancellation
- Or any other customer action
- Survival time is the time between a treatment and event
- Censored data refers to data that is defined by a lack of an event, causing it to have an unknown event time
-
Most survival models consist of two key components:
- A survival function
- A hazard function
Defining the Survival Function
-
A survival function outputs the probability that an event hasn't occurred in a time period
- This is also referred to as the probability to survive
- The survival function is defined as the following:
- Where, is a given point in time
- Where, is the survival time of a customer
- Where, outputs the fraction of customers who have not yet experienced the event by time point
- Where, is the probability of a customer experiencing an event by a time point
Describing the Survival Function
- Typically, the survival function is estimated
- To estimate the survival function, we must assume independence
- Then, the estimated survival function can be obtained by multiplying the probabilities for survival from one interval to the next
- Formally, the probability to survive to time can be estimated as:
- Where, is the number of individuals who haven't yet experienced the event at time
- Where, is the number of individuals who have experienced the event at time
- This is only an estimate of a single probability
- The estimate of the (cumulative) survival function is obtained by multiplying the probabilities from the origin time until time
- The estimated survival function is defined as the following formula:
-
This formula is known as the Kaplan-Meier estimator
- This has been proven to be the MLE
- This is a non-parametric formula
-
It can also be estimated using an exponential curve
- This is a parametric formula
- The following table compares their pros and cons:
K-M Model | Exponential Model | Cox Model | |
---|---|---|---|
Pros | Simple to interpret and can estimate the survival function | Can estimate the survival function and hazard ratio | Hazards can fluctuate with time and can estimate the hazard ratio |
Cons | No functional form and can't estimate hazard ratio and can only include a few categorical variables | Not always realistic and assumes constant hazards | Can't estimate survival function |
Illustrating the Survival Function
- Suppose we're analyzing a group of customers who have all received a promotional email
- All emails were sent at different times
- The observed data looks like the following:
-
The dataset represents the time of event for each customer
- Each is measured in days since the email was sent
-
The dataset contains indicators for whether each observation is:
- Censored ()
- Or not censored ()
-
For example, the first customer made a purchase on the second day after the email was sent to them
- Therefore, he's labeled as non-censored
-
Whereas, the third customer did not make a purchase by the time of the analysis
- She got the email three days before the analysis cutoff date
- Therefore, she's labeled as censored
- In this context, the probability to survive refers to the probability of not having made a purchase at a given time
- The following illustrate a series of cumulative probabilities:
-
Notice, will always equal
- This is because all customers are considered to be alive
Visualizing the Survival Function
- The result from our example correspond to the stepwise survival curve plotted below
- The survival curve summarizes the dynamics of a customer group
- Typically, we'll compares curves for different groups
- For example, a survival curve for customers who were treated with a promotion can be plotted together with a curve for those who were not
- Thus, the efficiency of the promotion can be graphically assessed
Defining a Hazard Function
- Whereas, a survival function outputs the probability that an event hasn't occurred in a time period
- A hazard function outputs the probability that an event has occurred in a time period
- Typically, hazard functions are used for analyzing how different factors (i.e. treatment parameters) influence the survival time
- Specifically, the hazard function is defined as the instantaneous hazard rate
- Meaning, it's the probability of an event in an infinitesimally small time period between and , given that the individual has survived up until time
- The hazard function can be reformulated in terms of the survival function
- As a result, we can switch between the hazard and survival functions in the analysis
-
A hazard function can be used to calculate hazard ratios
- A hazard ratio is the the ratio of the hazard for someone who has received the treatment relative to someone who hasn't received the treatment
- As a result, we can interpret this output as the multiplicative risk of someone who has received the treatment observing an event, compared to someone who hasn't observed the event
Defining Survival Analysis Regression
-
The basic survival and hazard functions can be used for:
- Describing the performance of a customer group
- Comparing different groups to each other
- This is not enough for predicting how survival and hazard are influenced by factors like marketing actions and customer properties
- Let's assume that each individual is represented by three values:
-
Where, is a vector of independent variables
-
This vector can contain:
- Customer demographic indpendent variables
- Customer behavioral independent variables
- Indicators of marketing communication to that customer
- Etc.
-
- Where, is a survival time or censoring time
-
Where, is a censoring indicator
- Observed events are labeled
- Censored cases are labeled
-
As both and are probabilities, we can construct different survival regression models by assuming:
- Different probability distributions
- Different dependencies between and the parameters of the distribution
Defining Proportional Hazard Models
- The most common type of survival regression model is the proportional hazard model
-
This model family makes the following assumptions:
- A unit increase in an observed covariate has a multiplicative effect on the hazard function
- This hazard function is constant over time
- Thus, the proportional hazard model is defined as the following:
- Where, is a survival time or censoring time
- Where, is a vector of model parameters
- Where, is a vector of independent variables
- Where, is the baseline hazard
-
Where, is the risk ratio
- This increases or decreases the baseline hazard depending on the independent variables
- Thus, the baseline hazard does not depend on the individual
- Whereas, the risk ratio does depend on the individual
- Since the hazard rate is never negative, the risk ratio is typically modeled as an exponential function to ensure it isn't ever negative
- By rearranging this formula, the model can be interpreted as a linear model
- Specifically, it can be intepreted using the log of the risk ratio for an individual to the baseline
Introducing the Cox Proportional Hazard Model
-
Regarding the baseline hazard we have two choices:
- Nonparametric
- Parametric
- The parametric approach assumes that the hazard function follows a certain probability distribution
- In this case, we must fit the parametric model by finding the optimal values of parameters and the parameters of the distribution
-
The disadvantage of this approach is that it assumes a fixed probabilitiy distribution over time
- However, this doesn't always reflect reality
- Since, the baseline hazard typically varies in an unpredictable manner with time
- Since the parametric approach smooths noisy data, it can be useful for providing a simple model for the baseline hazard
-
A nonparametric baseline hazard model can be estimated from the data by using the Kaplan–Meier estimator (or other methods)
- This leads to a semiparametric model for the overall hazard
- Where, the parametric part is defined by
- Where, the baseline hazard part is nonparametric
- This semi-parametric model is known as the Cox proportional hazard model
Describing the Cox Model
- To summarize the above points, the Cox model refers to:
-
The Cox model has the following benefits:
- We can estimate the hazard ratios without having to estimate the baseline hazard function
-
We don't need to make any assumptions about the structure of the baseline hazard
- It is convenient to only estimate the risk factors
- It is convenient to not estimate the absolute hazard values
-
The Cox model has the following disadvantages:
- The baseline hazard must be estimated by using parametric methods
- The Cox model makes the same assumptions as any proportional hazard model, which may not be true for our data
tldr
- A propensity model estimates the probability of outcomes of marketing actions
-
The shortcomings of propensity models include:
- Can't translate from probabilities to survival times
- Difficult to handle censored data
-
A time-to-event model can estimate:
- A customer is likely to make a purchase in days
- This time can be reduced by days by offering a discount of
-
A survival function outputs the probability that an event hasn't occurred in a time period
- For example, it can measure the probability that a customer won't purchase in the next years
-
A hazard functions outputs the probability that an event has occurred in the next few seconds
-
For example, it can measure the probability that a customer will purchase after the year mark
- Given the customer hasn't purchased yet (before the year mark)
-
-
A hazard function can be used for:
- Evaluating the performance of a customer group
- Comparing different groups to each other
- Survival analysis regression is used for predicting how survival and hazard are influenced by factors like marketing actions and customer properties
- The most popular type of survival analysis regression model is the (semiparametric) Cox model
References
- Video about Survival Analysis
- Video about Survival and Hazard Functions
- Video about Pros and Cons of Cox Model
- Video about Modeling Survival Function Curve
- Video about Cox Proportional Hazards Model
- Video about Cox Model Assumptions
- Textbook about Algorithmic Marketing
- Paper about Proportional Haard Model