John Geer

Customer Retention with Statistical Learning

5 Minute Read

Machine learning can be a valuable tool for keeping customers. We can use statistical learning methods to predict which customers are about to leave. However, we shouldn’t stop there. We can also use statistical learning methods to find the responses which best encourage customers to stay.

Predicting When Customers Are About to Leave

Statistical learning can estimate which present customers may be about to leave by using past data regarding whether previous customers stayed or canceled. Essentially, this involves using probability to determine if a particular customer is more similar to past customers who stayed or left. This process of trying to predict one piece of data based on other data is called supervised statistical learning (or supervised machine learning). The surprisingly adaptable linear regression is an example of this type of model, as are random forests or support vector machines.

The difference between a customer leaving or staying could be a simple email at just the right time.

There are an abundance of different supervised statistical learning methods. Which method performs best depends a lot on the particular situation. To build a good model for a particular dataset, it is valuable to understand how these methods work. However, the most masterful at building such models also use a notable amount of experimentation to see which methods actually perform best.

Luckily, supervised statistical learning lends itself to easy experimentation. One can test a model’s predictions by asking it to predict the outcomes of past data that wasn’t used to fit the model. One can then simply compare the model’s predictions with the actual results. This approach is broadly termed “cross validation”. As you would expect, there are many different techniques for performing cross validation. This testing of predictions on a notable amount of past data is often a much more accurate measure of prediction quality than statistical significance and hypothesis testing.

Finding The Response That Makes Customers Happy

The Limits of Prediction Alone

Only predicting which customers are about to leave doesn’t help much. One also needs to estimate how to keep those customers. Often, people’s first thought is to use the prediction model to figure out how to keep the customers that are about to leave. They try to influence the features of the leaving customers to make them look more like those of past customers that stayed.

Often extremely good predictors don’t have any influence on the event they predict.

Unfortunately, using the prediction model to determine the best interventions quickly runs into two problems. The first is the simple fact that we can’t change some variables. If customers tend to leave in the winter, there isn’t much we can do to avoid the winter. The other problem is that the variables used to predict whether a customer leaves usually aren’t the cause of that customer leaving.

This problem of most variables not being the cause of customers leaving is the deal breaker. We can’t tell which variables affect whether a customer leaves or stays. Often, extremely good predictors don’t have any influence on the event they predict. For example, I often get hungry a little after the Sydney Opera House opens in the morning. This is simply because the start of business hours in Australia and dinner time in the eastern US roughly coincide. However, if one were able to keep the Sydney Opera House closed for an hour longer, it would have very little effect on my appetite. This variable can predict my hunger but can not affect it.

Finding the right intervention to keep customers is going to require something beyond the prediction model. We can get hunches from interpreting this model, but we need a way to test what works. As it so happens, statistical learning can help us here too!

Using A Different Statistical Learning Method

A good place to start is with a three step approach. First, we can take past customers who have left and group them by their similarity to each other. Second, as we predict which present customers may leave, we can estimate which group they fall into. Third, with these present customers we can test different responses for each group. This will allow us to see which response works best for each group.

This approach assumes that similar customers will respond to the same intervention somewhat similarly. For example, two customers with similar behavior may both be confused about the same thing. The same explanation may help both of them. Likewise, customers that are quite different will require different actions. For example, some customers may be confused and others angry. These two varieties of problems may need different responses.

We want a method that constantly re-evaluates which interventions appear to be working the best and uses them more frequently.

The step of finding groups of similar customers is called clustering. It is a form of unsupervised statistical learning. One method of clustering that may work well in this situation is k-means clustering.

Once we have determined the different clusters of customers, we can start experimenting with interventions for each cluster. In this situation, we have dual goals of determining which intervention works the best and using the best performing intervention the most. If we simply did this by determining a necessary sample size and randomizing over our options, we would end up using a lot of under-performing responses. Instead, we want a method that constantly re-evaluates which interventions appear to be working the best and uses them more frequently.

The methods that are designed to handle just this situation are called “multi-armed bandit” algorithms. The name comes from considering a row of slot-machines with their big pull-arms. We want to use the slot-machine that gives the best return as much as possible. However, we have to find it first. One of my favorite such algorithms is called the “Bayesian Bandit”.

Solutions Aren’t Always Simple

The approach I suggest here uses three different statistical learning algorithms to encourage customer retention. The prediction part tends to get a lot of attention. However, the clustering and multi-armed bandit algorithms are essential for effectively using the predictions. Together, these three algorithms can be an impressive customer retention machine.

If you like my work, consider connecting to me on LinkedIn.