Speech Recognition Is Not Speech Understanding
‘Maybe we got lost in translation/ Maybe I asked for too much’
—Taylor Swift (‘All Too Well’)
Back in 2008 I partook in a one-of-a-kind, wondrous dinner in Paris. About a dozen Americans and Canadians crammed ourselves into an upstairs private room of a mid-level Parisian bistro near the Arc de Triomphe for a business dinner. The night turned raucous as several of the attendees broke out into boisterous song for seemingly no reason. After the delicious sole meuniere was served, they did it again. And again. And again. Selections from Carousel and Oklahoma! were certainly part of the repertoire, but as we were many bottles of excellent French wine into the night, I can’t remember any more of the impromptu song cycle. Now that I think on it, the wine and the raucousness could move beyond correlation territory into full-blown causation.
Although I took many photos that night, only one really stands out in my memory. Standing behind one of my merry band of pranksters, clutching a collection of wine lists and menus to his chest like the shield a medieval knight hopes will save him, is a truly exasperated Parisian waiter. Although his suit is beautifully tailored, his head is cocked at an angle that screams, “Can you not see how I am oppressed by these deranged foreigners?” Although this is a static photo taken with a cheap digital camera, I swear I can see les yeux du serveur rolling every time I glance at the photo. His right hand is, I’m sure unconsciously, held like a child pretending his fingers are a gun. It looks like he cannot decide who he’d rather shoot: himself or one of us.
The waiter—of course—spoke totally fluent English. He understood everything we said. Well, sort of. This was the collision point of three different cultures. (Yes, three. My Canadian friends and colleagues would drown me in Lake Ontario if I lumped their culture in with that of the United States. They’d be polite and apologetic about drowning me, sure, but dead I’d be just the same.) The waiter understood the words spoken, but he clearly did not understand the culture that buoyed us that evening. For the science fiction geeks among us, he clearly did not grok us.
I think about that waiter a lot these days. Technology vendors are telling companies that through the magic of machine translation, you can communicate with customers and prospects the world over. Contact center outsourcers and multinational companies are experimenting with tools that provide real-time webchat translation, as well as translations for service tickets and web self-service knowledge documents. Virtual agent vendors claim to have the ability to take the English language dialogue a company constructs and translate it seamlessly into German, Urdu, Xhosa, Hungarian, and Thai.
When I hear those claims, that maddened waiter comes to mind. You can understand all of my words and still not understand what I mean. Chatbots have been proving that true for several years now! Removing the cultural context of communication really does matter. I won’t say that it is impossible to provide empathetic and emotionally satisfying customer interactions across cultural lines, but it is certainly much harder.
Maybe we’ll get used to this Tower of Babel-esque demi-understanding. After all, with voice-over IP and poor cellular connections, we’ve all seemingly acclimated to degraded audio quality in many of our phone conversations. But, before you dive into any machine translation for customer interactions projects, at least spare a thought or two for that photograph of my perturbed waiter.
Ian Jacobs is a principal analyst at Forrester Research.