Modeling for Poets
Several years ago a prominent database marketer undertook a study to determine what features contribute to the success of a direct mail program. He hypothesized that four different elements played a role. These included the offer, the sequence and frequency, the targeting, and the message itself. While arguably there may be others, the firm concluded that 50 percent of the battle is targeting the right population.
Most managers would agree that modeling data provides an effective key to targeting. But modeling is not a cure-all - it has limits to what it can and cannot do. While many approaches exist in classifying modeling tools, one of the more helpful ways is distinguishing between predictive and descriptive analytics. My comments here will focus on predictive procedures.
Predictive approaches focus on problems that need a prediction as a solution. A marketer needs to predict who is likely to respond to a rewards program. A book club manager wishes to determine who is likely not to renew membership. The retailer desires a prediction of the right amount to be spent on customer retention in the next year. All these are issues that marketers continually face. By employing modeling tools, managers have an opportunity to increase revenue and/or decrease associated costs.
Most of us old-timers remember the trusted cross tab--nothing more than a simple spreadsheet. Take the following example:
The results suggest that married folks appear to respond better than single ones. This "modeling" tool provided a prediction that could be used to design future programs.
While this approach is still widely used, many managers discovered that for many enterprises, there are just too much data to analyze, and too many cross tabs to study.
Soon after this, some astute marketers discovered that by exploring three aspects of customer behavior, better predictions could result. RFM extends the cross tab analysis by looking at three dimensions. Recency refers to how recent the last transaction of a customer was. Frequency is defined as the number of times a purchase was made. And finally, monetary value points to the actual dollars spent. By taking the various combinations of RFM, and then summarizing the previous mailing results that are associated with these permutations, a manager can predict response rates. Let's look at the following example:
Cell#: Recency: Frequency: Monetary value: Response rate:
43 within last 3 months 3+ times $125+ 5.21%
6 within last 3 months 2+ times $100+ 4.63%
18 18 months+ 1 time < $12 0.05%
Based on analyzing a previous mailing, we notice that Cell 43 with an associated 5.21 percent response rate. Future campaigns might utilize the RFM categories connected to Cell 43.
You might be wondering as to the number of cells that one needs to analyze. Based on the various breaks available, the number of rows can get very large very quickly. Additionally, in our first example, we included marital status as a helpful piece of data. How do we fit this in?
A major leap forward in attempting to address some of the RFM issues was the introduction of TREE analysis. Based on 'TREE" software, the marketer produced the following:
Looking at the top of the TREE, we notice a 3.15 percent response rate for this particular program. By proceeding down the TREE, we arrive at "Marital Status" as being an important predictor. The analysis indicates that those that are married respond at a 4.71 percent rate. If the target is single, the response rate is about half as much, 2.37 percent. These results are identical to our earlier cross tab. While this cut by marital status provides a generous lift in overall response, we're not done. By continuing with the "Married" segment, we now find "Gender" contributing. By adding "Male" to the "Married" grouping, we arrive at a 5.93 percent response rate! So we started at 3.15 percent, and uncovered a segment performing at a 5.93 percent clip. The marketing team can now target and predict for its next campaign.
The beauty of TREE analysis lies in several areas:
What you see above is what you get
Easy to understand
Easy to implement
Easy to perform
Can handle large quantities of data
While TREEs provide an intuitively pleasing picture, many analysts now employ regression technologies for modeling. With regression theories deeply rooted in mathematics, marketers began using this technique about 40 years ago.
As with most modeling tools, regression analysis requires plentiful quality data. Sometimes this is not available. A data miner that can squeeze reasonable results with less data is the sign of an experienced analyst.
When marketers employ regression techniques, they are trying to predict what is referred to as the dependent variable. This could be response or likelihood of defecting, or sales, to name just a few. Those data items that assist managers in making the prediction are termed independent variables. RFM, customer behaviors and demographics are a few examples of independent variables. A final regression model typically will contain the "best" set of predictors, along with associated weights that need to be applied to these independent variables. Typically anywhere between five and 15 predictors emerge from the analysis. These predictors are selected by the regression tool in conjunction with the analyst. The weights are generated through the tool. Here is an illustration of a regression model:
Probability of Response = (0.0431 * spending) +
(0.3725 * age) +
(-.411301 * number of mailings) +
(-0.2117 * number of children) +
(0.0031 * gender).
We notice the following: There are five predictors or independent variables in this model. These are spending, age, number of mailings, number of children, and gender. (Gender is assigned a value of "1" if it is male; otherwise "0" is assigned.). Each of these predictors has a weight associated with it. This model or mathematical calculation is applied to each record that can potentially be targeted. The result of the calculation is a score. Typically, the higher the score, the more likely to demonstrate the behavior that is being modeled. In our example this behavior is response. So a marketing team would make a prediction with the regression model, and target those with higher computed values, or scores.
There are many other nuances and other modeling approaches that go beyond the scope of this article. Suffice it to say, that these techniques have proven themselves over and over again.
I frequently get asked two questions. The first is, "What is most important in the model building process?" Second, "Is there any time one should not use models?" The most critical part of data mining and modeling is data, and those firms that can creatively use data will have the most successful outcomes. If an enterprise plans on mailing everyone, it doesn't need to model. Otherwise (we just experienced another postal rate hike), data mining needs to be part of the marketer's toolbox. Remember, targeting contributes 50 percent to the success of a campaign. Give modeling a try. It can turn mediocre results into superior ones.
About the Author
Sam Koslowsky is vice president of modeling solutions for Harte-Hanks Inc. He can be reached at (212) 520-3259 or via email at email@example.com. Please visit www.harte-hanks.com