Many call centers have already made the move to convert from touch-tone IVR systems to speech recognition systems, and some applications have delivered the benefits that their promoters promised--I've seen several successful implementations firsthand. Unfortunately, I also see the other side of the coin--speech applications that have failed to deliver on the promise of speech recognition.
However, the situation isn't hopeless for call centers that operate speech-enabled IVRs. Some of the problems businesses encounter with these less-than-effective deployments can be remedied relatively easily. Even those that exhibit more substantial problems can often be dramatically improved.
For those considering speech-rec technologies in their call centers, the first step is to understand that it is not safe to assume the technology will approach 100 percent accuracy in recognizing human utterances in a deployed speech-enabled application. So, what can those considering a move into speech realistically expect?
Many variables can affect the accuracy rate of a recognizer and great variation can be observed; most telephony-based speech-enabled systems obtain around 82 percent to 84 percent overall accuracy. This means that the average user will experience one or two recognition failures for every 10 utterances that he makes. Alone, an accuracy rate in the low-to-mid 80 percent range might seem acceptable, but most users would find that level unacceptable when considering the cumulative effect of recognition failures. In a nutshell, just two or three recognition errors during an IVR interaction can evoke a seemingly unwarranted emotional response on the user's part--and it doesn't take many mistakes to get a user angry.
Unfortunately, some of the most severe usability problems of speech-enabled systems result from speech recognition failures. The solution is hybrid architectures--it is now possible to remove all of the experience of speech failure from a speech-enabled IVR.
Voice solutions provider Spoken Communications supports a hybrid architecture in which the speech recognizer is backed up by a human who can monitor four or more interactions at once. The human guide or assistant does nothing during the interaction; most of time the recognizer is likely to get things right. However, the guide is alerted in cases where the recognizer is uncertain of what the user said.
Unbeknownst to the user, the guide quickly listens to the utterance and indicates its consequence via a GUI interface back to the speech-enabled application. The recognizer thus offloads most of the recognition work from humans while a human steps in when needed to clarify what was said and quickly move the user forward in the application without the unpleasantries of recognition failures.
Users of such systems speak highly of speech recognition because the technology appears to always be right. The value of a speech-enabled IVR is not likely to be realized as long as users are annoyed and frustrated by speech recognition errors. Hybrid architectures afford the best solution currently available for maximizing the value of speech recognition apps.
Walter Rolandi is a doctoral-level human factors professional specializing in voice-user interfaces. He can be reached at firstname.lastname@example.org or 1-803-252-9995.
Acquisitions Get the Call in Contact Center Consolidations
Deals by both ClickSoftware and Spoken Communications signal an emerging trend in the customer service sector: competition through acquisition.