• May 13, 2009
  • By John Wilmes, chief technical architect, communications sector, Progress Software

Defeating Dirty Data

Customers don't really appreciate clean data, but dirty data will get you into trouble in a hurry, especially when it has any connection to your CRM. Your customers assume that "their" data is complete, consistent, and correct. Do you?

CRM intensifies data quality challenges because its data is more dynamic, more variable, and more voluminous than other domains. And CRM data is ultimately related to almost everything else. Dirty data in any application or database can cause customer-visible problems in the CRM.

Managing customer experience requires cleaning up dirty data-but you also have to prevent good data from going bad. Data quality isn't free, but most studies agree that it costs five to ten times less to keep a current customer than to gain a new one.

One way to approach data quality is to categorize possible problems. In order of increasing difficulty, they are validity, completeness, consistency, and correctness.

Invalid data-an impossible phone number, a nonexistent postal code, a birth date in the future-is easier to find and fix than other types of dirty data. It can and should be detected at the time of entry, either by applications or by the infrastructure that connects them, preferably using a configurable mechanism such as a reference database or an online service such as address validation. But other data quality problems are more likely to develop through a series of data entries or other events, and are less amenable to real-time correction.

Incomplete data is more difficult to find. Its detection requires a model of data relationships, and may involve multiple applications. Let's say that you are a communications service provider. Your CRM contains a customer's personal data, while your inventory system contains details of the network equipment on which the customer's services are supported. If data is later deleted from either application, or even if the link between customer and inventory is missing, you might not know it, unless you can follow that link in either direction. "Orphan" data on either end of the link will eventually show up in trouble tickets or wasted resources.

Real world systems are of course much more complex than this example. If you follow the data links from a single customer in CRM through all of your other systems, you'll see a complex network of interlinked records representing customer, products, services, resources, and perhaps a distributed value chain of external suppliers. In this context, incomplete data becomes even more significant.

Inconsistent data can be even harder to find, because its detection requires even more inside knowledge (substitute "rules" or "metadata" if you prefer). Even if the web of data for a customer is complete, it can still be inconsistent. In the example above, if your customer is paying for standard broadband service but their service has been set up at premium speed, a completeness check won't detect the discrepancy-the link from customer to inventory is fine. While your customer might not consider this particular situation a problem, you would be missing revenue and wasting bandwidth.

Incorrect data can be the most intractable, because much incorrect data will not be detected by validation, completeness, or consistency checking. Even though it is valid, complete, and consistent, it's just wrong. Not only is it harder for you to find, it can be maddening for the customer. Continuing the example, let's say we fix the inconsistent data above. The next day, the equipment fails and a network technician replaces it-but does not update the inventory system. All is well, until the customer orders an upgrade, or until the new equipment fails and you're trying to determine which customers are affected. In either case, in addition to operational problems, you now have an angry customer as well.

Dirty data has long been pursued by data cleansing and data reconciliation services and their vendors. But some new approaches are emerging as well. As the number, complexity, and interconnection of applications increases, enterprises are increasingly seeing data integration as a data quality opportunity, rather than just a technical problem to be solved as cheaply as possible. The common model architecture stands out in the data integration space, supporting semantic as well as syntactical definitions, and leveraging a single reference model to minimize development and maintenance costs while supporting change management and impact analysis. Common model based integration platforms that can generate flexibly deployable services in a variety of implementation technologies can go a long way toward keeping data quality high. And as enterprises migrate to service-oriented architecture, platforms that provide SOA governance and pervasive application instrumentation are standing out as well.

Dirty data may never completely disappear, but a data quality plan with solid infrastructure support can keep it to a minimum and help you maximize the value of your CRM.

About the Author

John Wilmes is the chief technical architect, communications sector, for Progress Software, a global supplier of application infrastructure software used to develop, deploy, integrate, and manage business applications. For more information, visit http://web.progress.com.

Please note that the Viewpoints listed in CRM magazine and appearing on destinationCRM.com represent the perspective of the authors, and not necessarily those of the magazine or its editors. If you would like to submit a Viewpoint for consideration on a topic related to customer relationship management, please email viewpoints@destinationCRM.com.

CRM Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues

Related Articles

Beware the Cost of 'Dirty Data'

Data may be more accessible, but is easy access worth the price?