Natural Language Understanding Grows Up
Thanks to Apple's voice assistant, Siri, natural language understanding has become the buzzword du jour not only in the enterprise, but in the consumer market as well.
Interest in natural language understanding (NLU) exploded even before Siri arrived, when IBM's Watson supercomputer appeared on Jeopardy! last year, competing and ultimately holding its own against human contestants until almost the very end. According to IBM, Watson's data included the application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open-domain question answering.
"The trends seem to be quite clear," says Ilya Bukshteyn, senior director of marketing, sales and solutions, at Microsoft Tellme. "When you look at the kinds of technologies that consumers are snapping up and buying in record numbers, whether it's Kinect or Apple products, it's very clear that natural interaction (language) done right is very, very compelling. You don't want to be the last company not offering a natural experience in your category."
Dan Miller, senior analyst and founder, Opus Research, says that NLU should allow people to speak naturally and have a reasonable expectation that a machine on the other end is going to understand their intent.
"Accurate recognition is key to cloud-based resources that understand intent," he says. "What's happened in the last year, as [demonstrated by] Watson, [is that] computer systems that can claim to understand what people are saying and accurately render words have become more reliable. It's getting better and better. It's never going to be perfect, but now in enough cases, such as with Siri, there's a feeling that you can put language on a front end of highly popular devices, and that's never happened before."
While the gee whiz factor is hard to overlook in the consumer market, the view of how well NLU works in the business market is divided. Many experts say that the technology is too expensive and has a long way to go, while others point to a spate of products mature enough to operate as money makers. Even with the deployment of commercial offerings, NLU continues to evolve.
IBM partnered with Nuance Communications to combine IBM's Deep Question Answering, Natural Language Processing, and Machine Learning capabilities with Nuance's speech recognition and Clinical Language Understanding solutions. Under the agreement, the companies are jointly investing in a five-year research project designed to advance next-generation natural speech technologies, which will be commercialized by Nuance. (Nuance is the same company that has been the rumored partner of Apple in developing Siri, although neither company will confirm this.)
What Is NLU?
NLU is the ability of users to interact with any system or device in a conversational manner without being constrained by responses.
"What NLU does is understand a string of words or utterances," explains Daniel Hong, lead analyst at Ovum. "NLU takes into consideration statistical language and semantic language and combines the two. The engine that powers NLU has to be able to understand a sequence of words and process it to determine what the intent is behind the caller."
What NLU is not is speech recognition.
"Sometimes people don't realize that there's a big difference between speech recognition and NLU and they confuse them," says Roberto Pieraccini, former CTO of SpeechCycle and now the director of The International Computer Science Institute.
As an example, he explains the difference between Siri, which uses natural language, and Google Voice Search, which uses speech recognition.
"In the case of words that are spoken into a text box (like Google Voice Search), it does not mean that the machine understood what you said, but can translate sounds into words. But if you go to Siri, the meaning of the words can be understood, because there is a second part of the application, which is natural language understanding."
Dena Skrbina, senior director of solutions marketing of the Enterprise division at Nuance, says that NLU is more than just collecting information; it's about determining intent.
Natural language also needs to flow smoothly. "A system that constantly asks for confirmations creates a disjointed conversation that callers tend to reject," she says. "However, systems that can handle corrections and verifications by dynamically embedding the confirmations in the next prompt are more engaging, leading to better automation rates."
AT&T has been focused on NLU for support in the IVR for the past 20 years and has been actively deploying natural language applications in the IVR for 13 years. The business unit that provides natural language solutions is AT&T VoiceTone, using the AT&T WATSON speech and natural language engine. The solution tries to understand the intent of a caller though machine learning algorithms.
"We take large data and train automated systems to learn from the data," says assistant vice president Mazin Gilbert, of AT&&T Intelligent Systems Research. "Our algorithms in the AT&T WATSON engine allow us to learn from the variability in data; people speak about the same issue, the same intent, in thousands of different ways. Also, these algorithms have to be robust to accents, dialects, background noise, and devices that are used."
The second thing natural language does is extract information, Gilbert says.
"We take information the caller is providing and extract it….If it's a billing problem, we want to know your name, address, etc."
Once the intent and information is extracted, based on AT&T's dialogue technology and how the system is designed, a company can send callers to a specialized agent or complete the automation.
"Some systems may not fully automate because it could be part of the design, or it could be that the complexity of the request is hard and it's best to send them to a specialized agent," Gilbert says. "Routing the request to a specialized agent is an important action of the system, as it helps that agent address that issue, and not go to just any agent."
What NLU Can Do For You
Natural language understanding is meant to attack a basic problem of call centers—extended call times—by automatically handling calls, and reap significant savings too.
In one case, customer experience management solutions provider SpeechCycle worked with a national broadband service provider that used a legacy touch-tone IVR application to handle 40 to 60 million calls per year. The complex menu system often took customers up to a minute to navigate, and a misroute rate of 25 percent was estimated to cost the company millions of dollars in agent retransfers.
After using SpeechCycle's NLU Phone Portal solution, the company was able to handle four million calls per month, with the system able to understand more than 280 call reasons. Average routing time was reduced to 35 seconds, a 50 percent improvement, and automation rates in downstream IVR applications climbed by 22 percent, to 35 percent.
"The SpeechCycle natural language processing capability is high definition, meaning that it enables a high degree of accuracy in understanding the specific reason for a call," says Scott Kolman, senior vice president of marketing at SpeechCycle. "Our systems can understand over 280 distinct reasons people call a service provider for customer service. By quickly determining the reason for the call, our applications are able to act on this understanding by directing them to the appropriate treatment, be it information such as an FAQ or information about their bill, a step-by-step troubleshooting application, or to an agent specifically trained to address their issue. The [results are] the ability to automate calls previously requiring agent involvement and greater customer satisfaction as the caller's issue is resolved quickly and efficiently."
Nuance offers several solutions for the IVR using NLU. The core technology for understanding natural responses to open questions (such as "How may I help you today?") is called SpeakFreely. Its technology involves taking a collection of responses to the open question, analyzing each to attribute a meaning, and then defining an appropriate application response. An IVR can respond to unique requests that have not previously been encountered by using SpeakFreely for NLU.
SpeakFreely is used in the Nuance Call Steering Portal (NCSP), a Web-based portal used to create, deploy, and optimize NLU call steering solutions. The company describes NCSP as enabling someone without a Ph.D. in speech science to bring NLU to the masses faster without breaking the bank. Nuance technology has over 125 NLU solutions in 17 languages packaged into a guided graphical interface so that their customers can deploy their NLU solution. This means that improved caller satisfaction, increased automation, and reduced agent misroute rates are now available with reasonable ROI to enterprises outside the Fortune 500, the company says.
One of Nuance's long-time customers, Amtrak, uses an IVR deployment that contains SpeakFreely and Call Steering that helps "Julie," the company's automated customer service representative. Julie handles about 20 million callers a year, and roughly 50,000 calls a day, though during peak travel times, she may handle as many as 95,000 callers a day. She is able to recognize 45,000 cities, up from 1,000 from when she was first launched. She completely handles an average of 25 percent of all calls placed to the 800 number, or approximately five million calls a year. Of those who attempt to use the self-service system (for example, those who don't hang up the phone or ask immediately to be transferred to an agent), Julie fully handles, on average, 54 percent of calls. According to Nuance, Julie completes more calls in a day than one human Amtrak agent handles in a year.
AT&T's Gilbert says that the company believes there is a basic premise that customer service can be improved by not forcing customers to follow menus. "We wanted to reverse that and let the technology understand what the customer is saying," he says.
The other obvious premise is to reduce the length of the customer call.
"As opposed to following numerous menus in the IVR and getting lost, natural language–enabled customers can articulate their problem and [allow] the system to understand it much more quickly," he says.
AT&T helped electronics giant Panasonic overcome those problems in 2005, as noted in an article in the New York Times. According to the paper, callers took an average of two-and-a-half minutes to go through the IVR menu, and 40 percent simply hung up. After implementing AT&T's NLU solutions, by 2010, Panasonic was able to resolve a million more customer problems a year, with 1.6 million fewer calls than in 2005. The cost of resolving customer issues fell by 50 percent.
Gilbert likens NLU to babies.
"Babies don't know what's right and wrong," he says. "They don't know what instructions are. We teach them that. When they do something right, we pat them on the back; when they do something wrong, we correct them. It's the same process here. We take data and train these systems and monitor them and correct them. We call it reinforcement learning."
Barriers to Adoption
While the advances made in NLU are good news, there's bad news too. There is a significant barrier to widespread adoption of NLU, and that barrier is cost. Purchasing licensing technology, implementing it, and maintaining it is prohibitively expensive for many companies.
"The reason NLU is expensive to implement today is because you either have to pay one of very few companies to use their technology, or you have to invest in a lot of research science," Microsoft's Bukshteyn says. "It's an expensive proposition from an investment perspective."
Another factor preventing widespread adoption is that NLU tends to work better in verticals where open-asked questions have constraints, such as utilities and travel.
"The more categories you have, the more different kind of users you have, the harder it is to categorize what they're saying," says Deborah Dahl, principal of Conversational Technologies, and chair of the World Wide Web Consortium Multimodal Interaction Working Group. "If you have something like an airline, where most of the callers are used to the system and have a clear idea of what they want, the system's going to work better because what the caller will say is more precise."
Some industry experts also believe that companies don't need a full-blown NLU engine, but can incorporate some of the technology into solutions they already have in place.
Natural language is not the end-all and be-all in customer service, and in many cases, is overapplied, believes Andy Middleton, a business consultant at Performance Technology Partners. The best solution may be directed dialogue, which prompts users to say certain phrases, natural language, or a combination of the two, with directed dialogue used as the primary modality and natural language used as the second modality or vice versa, he says. The best modality depends on your callers, processes, and balance between caller satisfaction and cost.
"In the past, when speech was first applied, people envisioned the end of touch tone, but today touch tone is alive and well, and speech is simply one of several potential solutions," Middleton says. "Natural language versus directed dialogue is no different. Natural language is simply another potential solution. Silver bullets are few and far between."
A Change Is Going to Come
Microsoft and other larger entities like Nuance and AT&T are working to level the playing field, and to offer NLU at a lower cost and broader scale.
"Over the past two years," says Skrbina, "Nuance has significantly reduced both cost and effort to deploy natural language by decreasing the amount of caller data required…and developing tools to automatically employ best practices and assist in the maintenance of the applications. In many instances, [they are] a fraction of what [they] had been in the past."
"At Microsoft, our goal is to enable natural interaction very broadly and not require deep speech capabilities and a bunch of research," Bukshteyn says. "What we're looking to do…is to democratize that, meaning provide a cloud platform where we've done significant investment upfront, and the research and science and developers can build on that to deliver natural language specific to their application."
Gilbert says that AT&T is also trying to cut down obstacles by offering its new cloud-based enabler platform that will have speech capabilities and can be used to more rapidly create natural language customer care and sales applications for contact centers. And instead of just building thousands of applications, AT&T expects to be able to build hundreds of millions of applications.
"We've been working on going from very unique, very expensive applications," he says. "We've been trying to move to the next level in how you scale this business. There are many new drivers that weren't there ten or fifteen years ago.
"The barrier to entry is going to be reduced quite significantly because now you're not talking about proprietary database access and design. You're going to have more standard-based APIs. We're working on basic speech and natural language enablers for creating these specific applications so it becomes more like a front end. You can just use the API and create any mashup applications you want."
While NLU has made great strides, there is still room for improvement. Mobile and cloud are expected to continue to drive interest and lower cost, hopefully allowing more companies to board the NLU train.
Staff writer Michele Masterson can be reached at firstname.lastname@example.org.