Rethinking Data Quality
GET THE BALL ROLLING
In life, keeping things tidy is not a task people often enjoy if it takes them away from other goals that are more exciting. With data, it's not much different. As Hayler points out, "Data quality is not the sexiest subject, and part of [the problem] is that people don't grasp to what degree it can be costing them money."
A common reaction within organizations, Hayler says, has been to shift the responsibility to one department, typically the information technology (IT) department. But while IT might seem like the logical outlet, the department is often unaware of which data items are the most critical or are causing the most pain. "If you disregard data quality as a technological problem or the IT department's problem, then it's probably not going to get fixed," Hayler says.
Hayler stresses how important it is that the various parts of the organization—those who are aware of their business processes and problems—are keen on, and involved with, the concept of cleaning data.
(Many companies have already begun taking data maintenance seriously, Schutz says. In recent years, some companies have even elected to create a position called the chief data officer.)
"It's either you sell people on the benefits or beat them over the head with a big stick," Hayler says, and "it's usually more productive to explain the benefits."
This year in a Webcast produced by CRM magazine, Simon McVeigh, director of cloud product specialists at Informatica, had a similar recommendation for firms that are considering an investment in data cleansing technologies. "It's worth prioritizing, not to bite off more than you can chew," McVeigh told listeners. "The quicker you can show wins and early progress, the more likely you are to gain support and continue cleansing the rest of your data."
APPROACH CLEANING PRACTICALLY
Fortunately, there are some methods companies can implement to reduce the amount of cleaning they have to do over time. Wu recommends keeping close tabs on input sources and determining how data is being collected and where. Wu holds that the best way to ensure data is clean is to make sure it passes through a clean filter.
This includes having consistent ways of collecting it and anticipating formatting problems. For example, a Web site's fields should be defined so that customers understand what is being requested of them. A customer will be annoyed and frustrated, for instance, if it's unclear on a billing page's address line why an error is occurring to prevent him from submitting, leading to an unhappy experience for the customer and company.
Companies can avoid errors by foreseeing the kinds of complications that might come up with data. For instance, someone who's new to the country and asked to fill out her birthday in a quick field that asks for month, day, and year, in that order, might mistakenly put the day where the month should be. The same goes for other units, such as price.
Keeping note of such obstacles will also help reduce the amount of duplicate materials that are entered into a company's records, since it will be easier to sync and match data that is formatted similarly.
MATCHING AND DUPLICATES
One way to assess whether duplicate data exists is to create a map that shows users visually where certain pieces of data are being collected, and what is being collected at those touch points. If, for example, contact center agents are asking customers for their home address, and the same information is being requested on the Web site, there's a good chance duplicate information has been entered into the company's CRM system.
There are tools designed to help companies keep track of the information they're storing. Experian's Pandora platform, for instance, provides audit logs that can be used to monitor data and send alerts when incorrect information is found in a system. Users can track the history of data to help build context, and a graphical visualization tool lets users navigate and better understand the information in their systems. In these ways, a company's unstructured data can be organized, standardized, and transformed.
Wu recommends never deleting data and allowing room for error in the cleansing process. Though it might be unlikely, items that are detected as duplicates might turn out to be coincidentally similar.
EVALUATING THE TECHNOLOGY
Just as there is an abundance of data to work with, there are a great many vendors that specialize in data quality maintenance. Hayler estimates there are more than 50 vendors in the space, and the number is constantly growing; analyst firm TechNavio recently predicted that the market will grow 17.1 percent each year through 2019.
It can be a daunting prospect to assess all of the vendors to determine which is the most cost-effective and well suited to your company; companies need to know how to evaluate the various solutions. On its Web site, the Information Difference provides a resource that helps summarize the strengths and weaknesses of the key vendors in the market.
"There are some technologies that may be very limited, but very good in the area of name and address, and also very cheap," Hayler says. "But you may not need anything beyond that. On the other hand, you may, so this is where it all comes back to building a business case—doing that and getting a sense of what you're going to do with the tools."
Experian Data Quality Adds Business Data in Real-Time Data Enhancement Product
Experian Data Quality's new offering leverages Experian Business Information Services' data resources.
Informatica Releases V10 of Its Data Management Platform
Informatica PowerCenter, Data Quality, and Data Integration Hub get enhanced agility and performance combined with new features.
Openprise Launches the Openprise Data Marketplace
The marketplace is designed to help companies leverage data from multiple third-party sources.