• December 5, 2014
  • By Leonard Klie, Editor, CRM magazine and SmarCustomerService.com

DataSift Launches VEDO Focus to Filter Social Data

DataSift Thursday released VEDO Focus, a new text classification engine that categorizes the entire real-time firehose of social media data into nearly half a million unique topics.

Focus uses a corpus of more than 1 billion facts to categorize every single tweet, post, and blog entry into an ever-growing hierarchical taxonomy of more than 450,000 topics.

While that sounds like a lot, Jason Rose, senior vice president of marketing at DataSift, says it really isn't that much when you consider the number of industries, companies within those industries, brand names from those companies, and components within those product lines.

As an example, he cites the auto industry. Within that, there are dozens of manufacturers that each have dozens of car models. Then within those discussions, there could be discussions about components, such as brakes or airbags.

"As a tweet or post is generated, we apply our filters to it, and we are able to classify it into topic and subtopic levels," Rose says. "And we can turn it around in milliseconds."

VEDO Focus reads and understands natural language, unlocking the meaning within the text and categorizing it to enable developers to build applications that can interpret, understand, and analyze social data at scale.

DataSift collects and stores the social data, and companies query that large database to get the specific information they want. DataSift can deliver the information in real time as an active data stream or aggregate it and send it in batches, according to Rose. The company just needs a data repository to receive and store the data and the resources to mine and analyze it.

"We normalize the data into our own data module and then people query into that data," he explains. "We process the data and deliver it as a data stream that matches the individual filters."

DataSift processes more than 2 billion posts per day and has more than four petabytes of historical data in storage. It collects data from Twitter, Facebook, Google +, Tumblr, YouTube, Instagram, Wikipedia, Jive, bit.ly, WordPress, LexisNexis, and hundreds of other sites around the world, according to Rose.

"Social data is just so voluminous today," he says. "We take it and apply very simple filters that you create.

"Our ability to categorize all that information at a nuclear level is unique," Rose continues. "[Focus] really lets you filter through all the noise to get to the topics you care about like never before."

"As social data becomes integrated into an ever-growing ecosystem of application developers and agencies, the imperative for these companies is to focus on creating groundbreaking insights, not infrastructure," said Nick Halstead, CEO and founder of DataSift, in a statement. "With the launch of FOCUS, we're demonstrating our commitment to doing the heavy lifting associated with preparing big, social data for analysis, and enabling developers to get to insights faster."

The company earlier this year partnered with Alteryx to expand its analytics capabilities. Other partnerships for data analytics include Tableau, Splunk, and Informatica.


CRM Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues

Related Articles

Xerox Is Working to Simplify Social Data Mining

Xerox researchers are teaching computers to identify and route social media sentiment data to the humans who can best respond.

LeadSift Releases Self Serve Platform for SMBs

The LeadSift Self Serve platform analyzes millions of Twitter conversations in real time to find opportunities.

DataSift Releases VEDO Intent for Social Media Data Extraction

VEDO Intent uses machine learning to access and derive insight from social media.

Falcon Social Partners with DataSift

Partnership gives customers access to aggregated and anonymized Facebook topic data.

NetBase Partners with DataSift

The partnership brings brands access to Facebook topic data.