NEW YORK—Big data isn't just about cold hard numbers anymore, Rick Smolan, former National Geographic photographer and CEO of Against All Odds Productions, said at the DBTA Data Summit as he kicked off the conference on Tuesday. "I used to think big data meant big brother, or that it was just about all these numbers that only really had significance for data scientists, but now I see the humanness of it," he said.
Smolan's new book, The Human Face of Big Data, was inspired by Yahoo CEO Marissa Mayer, who shared with him some of the impact that big data has had on society. "[Mayer] told me about how the Institute for Environment and Sustainability is using high-resolution mapping to discover villages in Nigeria that had never been seen or charted before, and using the data from that technology to deliver polio vaccines to areas where the disease was rampant because people hadn't been vaccinated," he said. The story, Smolan explained, inspired his own research.
From chips that can be installed in consumers' homes to monitor precisely how electricity is being used to a project called Million Dollar Blocks, which can evaluate and pinpoint the neighborhoods that produce the most criminals, big data is "very different than what it used to be," Smolan found. As big data continues to grow, however, the technology used to store, process, and analyze it must evolve as well, summit speakers agreed.
Kamran Khan, CEO of Search Technologies, an IT services company, reiterated Smolan's call for an evolving perspective on big data technology, highlighting why enterprise business intelligence must incorporate search into its architecture. "Tomorrow's enterprise search must have a tool for big data processing, because building in that search capability ensures that you can re-ask your questions or ask new ones as they arise. With search, all of the information is indexed, and is much more accessible," Khan said.
Big data is also dictating some of the changes occurring in predictive and descriptive analytics technology. In the past, the traditional predictive model "only looked at the middle," Afshin Goodarzi, chief analyst at 1010data, a cloud-based platform for big data discovery and data sharing, said. Every single Costco shopper in the United States, for example, receives the same set of coupons, for the same set of products. "That's aiming for the middle," he said, "but what we need to do now is look at the individual, and target specifically."
But achieving that level of personalization is no easy feat—it involves looking at the entire business intelligence stack, which takes time and is hard to do, according to Goodarzi. Still, some technology can overcome the challenge. "With 1010data's solution, however, data scientists can actually look can at that entire stack, and everyone can be looking at the same copy of the data. We don't spend time moving the data back and forth, so it gives us the time and capability to look at the entire set. We don't have to sample, or shoot for the middle. We can look at the whole thing,” he explained. Practically speaking, 1010data’s technology can build a 30-day shopping list for each loyal shopper at any retail chain, Goodarzi said.
The highly personalized approach drives better results, he added, and makes old beliefs—such as if the customer bought once, he is likely to buy again—obsolete. "Sometimes these generalizations are true, but sometimes they're not. The point is, we don't need to rely on them anymore. It just doesn't cut it anymore because there are better ways to build models," Goodarzi said.
Other big data technologies are also changing the way organizations manage big data because of their affordability and agility, with infrastructure software Hadoop emerging as a favorite among Data Summit attendees. The solution is appealing because, while traditional data warehouses face a variety of limitations including the lengthy time to market, the high cost of data retention, and the lack of detail, Hadoop "doesn't have these limitations," Paul Curtis, senior systems engineer at MapR, a Hadoop distribution, said.
Hadoop is also cost efficient because it leverages the two main trends in the IT industry—making use of commodity hardware, which delivers high performance and and high capacity at a low price, and the "open-source phenomena," which makes advanced software products available to everyone, Alex Gorbachev, the chief technology office of Pythian, explained during a panel on the future of data warehouses.
Companies like Cisco, for example, are using the technology to eliminate silos in their security intelligence and better protect their customers from intrusions. Cisco's challenge was an inability to scale with existing infrastructure to a million events per second from nearly 100 different channels over tens of thousands of distributed channels, Curtis explained. With Hadoop, he said, the company was able to ingest over 20 terabytes of data.
"Hadoop won't replace data marts, but it can help augment existing solutions," Curtis said. "Without it, it'll be very difficult for companies to scale up in the way that big data demands," he concluded.