True to its iconic logo, Hadoop is still very much the elephant in the room. Most every data scientist has heard of it, yet relatively few can say they have a firm grasp on what the technology can do for their business, and even fewer have actually implemented it successfully at their organization. But in 2015, Hadoop will be hard to ignore.
Forrester Research predicts that this year, Hadoop will become a cornerstone of the business technology agenda at most organizations. It is already a major "disrupter of data economics and analytics," according to a report from analyst Mike Gualtieri. So what makes Hadoop uniquely suited to change the big data game? Scalability, affordability, and flexibility.
An open-source software framework, Hadoop allows for the processing of big data sets across clusters on commodity hardware either on-premises or in the cloud. At roughly one-thirtieth the cost of traditional data storage and processing, Hadoop makes it realistic and cost effective to analyze all data instead of just a data sample. I's a malleable solution, and its open-source architecture enables data scientists and developers to build on top of it to form customized connectors or integrations.
The Case for Hadoop
Most companies make the leap to Hadoop in one of two ways, says Ashley Stirrup, chief marketing officer at Talend, an open source data integration provider. The first is with data warehouse optimization. Typically, data analysis requires some level of data preparation, such as data cleansing and eliminating errors, outside of traditional data warehouses. Once the data is prepared, it is transferred to a high-performance analytics tool, such as a Teradata data warehouse. With data stored in Hadoop, however, users can see "instant ROI" by moving the data workloads off of Teradata and running analytics right where the data resides. "You don't need to move it back and forth," Stirrup explains.
Other Hadoop beginners use it for live archiving. Instead of backing up data and storing it in a data recovery system, such as Iron Mountain, users can store everything in Hadoop and easily pull it up whenever necessary. Both of these functions, however, are "really just the tip of the iceberg," Wayne Applebaum, vice president of analytics and data science at Avalon Consulting, says.
Hadoop early adopters quickly realized that the database's greatest power lies in its ability to house and process data that couldn't be analyzed in the past due to its volume and unstructured form. In the CRM space alone, there's immense potential, according to Applebaum.
"So much of the individual data that's stored within CRM systems is notes. And these are notes that don't fit into any traditional database, but are so relevant to the customer history. For example, what you can do with Hadoop is parse call center notes for certain keywords, associate them with products that you know people are calling about, and detect consistent product problems before a widespread situation occurs," he says. Additionally, Hadoop can parse emails and other unstructured feedback to reveal similar insight.
From a marketing standpoint, there's plenty to gain as well. Hadoop enables marketers to get as close to performing social media sentiment analysis as modern technology will allow, providing them with a path to understanding the positively and negatively charged conversations that take place on Twitter, Facebook, and elsewhere. And, as the Internet of Things drives more wearables and other smart devices into the market, sensor data collected from those items will need to be stored and processed somewhere before marketers can incorporate it into personalization efforts or other campaigns—Hadoop is equipped to handle that type of data as well.
The sheer volume of data that businesses can store on Hadoop changes the level of analytics and insight that users can expect. Because it allows users to analyze all data and not just a segment or sample, the results can better anticipate customer engagement. "Ultimately, Hadoop is surpassing model analytics that can describe certain patterns [and is now] delivering full data set analytics that can predict