Beyond the Three V's of Big Data
A VARIETY OF FORMATS
According to Duncan, data sets arrive in a variety of formats, and the number of data types continues to grow. He cites the advent of technologies such as radio-frequency identification (the use of electromagnetic fields to gather information from tags attached to objects), smart metering (devices that monitor information on energy consumption for billing purposes), and the ubiquity of mobile devices with geolocation capabilities as examples of diverse sources of consumer information. All of these technologies have their own methods of capturing and publishing data, which adds to the complexity of the information environment.
But overcoming these data complexities could be well worth it. According to York, having a large variety of data is crucial for creating a holistic customer view. She specifically notes that access to data such as a customer’s purchasing history, personal preferences based on social media postings, exercising habits, caloric intake, and time spent in the car can help companies understand that customer on a deeper level, and thus build experiences that are tailored to that customer.
But this diversity of data sources, Noel posits, can be “a blessing and a curse”—a blessing because marketers have an increasingly large range of channels from which to pull customer information, but a curse because it can be difficult to filter through that information to find the most valuable content. Goodarzi has a similar point of view, saying that “variety is a little overstated in what people talk about for Big Data.” He mentions audio and video as examples of channels that can be particularly difficult to analyze: “It is rare that somebody analyzes video or audio directly—usually what they do is they take that data, they try to come up with an intermediate representation of that data, and then use that intermediate representation to apply old or new algorithms to try to extract signals, whatever the definition of signal is for that business problem they’re trying to solve,” he says.
Volume, velocity, and variety are undoubtedly important to managing customer information. Nevertheless, experts have also identified other aspects that are crucial for companies to keep in mind if they want to make the most of their data. According to Duncan, data tools such as Apache Hadoop and Apache Spark have enabled new methods of data processing that were previously out of reach for most organizations. Duncan adds that while the growing volume of data, the time needed to process it, and the sheer number of input sources pose challenges for businesses, all three can largely be addressed through purely technological methods at this point.
NEW V’S EMERGE
Investment in Big Data has begun to stabilize and enter a maturity phase over the past year, although Duncan does not expect Big Data to become the new normal for the broadest parts of the IT market until 2020. It will take time for infrastructure and architectures to mature, and best practices should be developed and refined against these architectures. Nevertheless, because these changes are already beginning, he says, businesses must turn to considering how to use Big Data to bring about specific outcomes—in other words, organizations should examine the challenges of Big Data from a business perspective as opposed to a technical one. A framework that incorporates the business-oriented characteristics of veracity and value can help enterprises harness Big Data to achieve concrete goals.
It might go without saying that not all data is the same, but businesses may not be paying enough attention to changes within individual data sets. According to Duncan, contextualizing the structure of the data stream is essential—this includes determining whether it is regular and dependable or subject to change from record to record, or even with each individual transaction. He says that businesses need to determine how the nature and context of data content in all its forms—text, audio, or video—can be interpreted in a way that makes it useful for analytics models.
This is where the veracity of data—or, as Dale Renner, CEO and founder of RedPoint Global, puts it, “the trustworthiness of data”—comes in. Determining trustworthiness is particularly important when it comes to third-party data, which Renner refers to as “the dirtiest data” a business will work with. Renner contrasts third-party data with first-party data, which he says is “the cleanest data in any enterprise” because it passes through a set of edits and validation rules.
Duncan adds that veracity entails verifying that data is suitable for its intended purpose, and usable within a given analytic model. He suggests that businesses use several measurements to determine the trustworthiness and usefulness of a given data set, and that establishing the degree of confidence of data is crucial so that analytic outputs based on that data can be a stimulus for business change.
Gartner recommends a number of metrics for evaluating and cleaning up data records: completeness measurements, or the percentage of instances of recorded data versus all available data within a business ecosystem or market (or the percentage of missing fields within a data record); uniqueness measurements, or the percentage of alternate or duplicate data records; accessibility measurements, or the number of business processes and personnel that can benefit from access to specific data, or that can actually access that data; relevancy measurements, or the number of business processes that utilize—or could benefit from—specific data; and scarcity measurements, or the probability that other organizations—including competitors and partners—have access to the same data (the scarcer the data, the more impactful).
Information Builders Releases WebFOCUS Business User Edition
At its annual user conference, the business intelligence vendor introduces self-service tools to help non-technical users "harmonize, visualize, operationalize, and monetize" their data.
IBM Storage Solutions Team Up with Hortonworks Data Platform
The collaboration aims to enable IBM clients to use existing and future investments in IBM storage to deploy Hadoop-based Big Data applications.