Voice Biometrics Are Not 100 Percent Foolproof, but Steadily Improving

Article Featured Image

In most cases, a score of 95 percent is good enough and sometimes even exceptional. But cybersecurity is an area where those lofty numbers fall short of the goal of keeping 100 percent of possible intruders out of enterprise systems. In that quest, voice biometrics solutions offer businesses a better way to ward off hackers than traditional passwords but fall short of their ultimate desire: a completely foolproof system.

Interest in voice biometrics is rising because traditional security systems do a mediocre job of keeping confidential information safe. The frequency and complexity of cyberattacks has been rising, and the preliminary lines of defense, such as passwords, have been constantly breached. In fact, cybercrime inflicted damages totaling $6 trillion globally in 2021, and that number is expected to reach $10.5 trillion in 2025, according to Cybersecurity Ventures.

One reason for the growing problem is the sheer volume of information that companies collect and store nowadays. The world created 64.2 zettabytes (1 ZB equals 1 trillion gigabytes) of data in 2020, and that number is expected to grow to 180.7 zettabytes in 2025, a compound annual growth rate (CAGR) of 23 percent, IDC predicts.

Most businesses have trouble protecting all of that sensitive data. In response, the security industry has been shifting its focus away from passwords to other forms of authentication. They started by pairing passwords with a second form of identification, a process dubbed multifactor authentication, which comes in three types, according to Gartner.

The first technique linked passwords with personal information, like a user’s hometown. But with so much information online, hackers quickly caught up with that option. As a result, a growing number of companies moved to a second type of authentication, credentials curated by the organization. Common options are a token or passcode sent to users when they attempt to access a resource.

The third alternative, biometrics, focuses on something intrinsic to the individual and, therefore, is difficult to mimic. This approach has been gaining traction. In 2021, an increasing number of large enterprises and cloud contact center platforms started to implement voice biometrics for both authentication and fraud detection, according to Matt Smallman, founder of SymNex Consulting.

Voice biometrics software works by breaking down a person’s input audio into an individual sound spectrum. These solutions have two designs. Active authentication voice biometrics requires that customers repeat a series of preset passphrases to create a unique voiceprint. Passive authentication analyzes a conversation and creates a unique voiceprint for each caller, and when that person begins speaking, the system analyzes each sound and uses preprogrammed algorithms to match the provided sound with a preloaded signature.

Voice biometrics is appealing because it is simple to use. The person does not have to remember an obscure sequence of numbers, capital letters, and special characters. Also, technology-based solutions are more effective at identifying bogus transactions than humans. As a result, the voice biometrics market is expected to reach $5.9 billion in 2028, growing at a CAGR of 22.3 percent during that timeframe, according to market research firm Reports and Data.


No matter which type of system is used, the reality is that no authentication method is entirely foolproof today—and probably will never be. The potential shortcomings of voice input start with the underlying technology. It relies on two levels of speech recognition. The first identifies each word that the person inputs. Through the years, these solutions have improved, so they now offer accuracy rates higher than 90 percent.

In addition, voice biometrics solutions identify who the caller is. Underscoring the complexity of the task, work on such solutions has been under way since the turn of the millennium. “We began working on voice biometric technology more than 20 years ago,” says Brett Beranek, Nuance Communications’ vice president and general manager for the security and biometrics line of business. “A large company had a ring of criminals that would call in with different schemes. They wanted to identify the individuals and keep them out of their systems.”

Naturally, companies’ main concern before introducing voice biometrics into their operations is whether systems really operate as advertised. Vendors think so, and the market maturity illustrates a growing level of confidence in these systems.

“I would place the use of voice biometrics in the mature product category,” says Dan Miller, lead analyst and founder of Opus Research. “The core engines for matching stored [voiceprints] to captured utterances are extremely accurate.”

But the bad guys are busily at work trying to find ways to circumvent this—as well as any other—security check. As a result, vendors and criminals are in a constant game of leapfrog where the latter comes up with a new ruse and the former responds with the necessary system enhancements.

Here are a few ways that systems might fall short in identifying an incoming caller correctly:

Short Utterances

Validating callers requires that they input enough data so the system can create unique voiceprints. “The biggest innovations of the last few years have been in reducing the length of audio needed for authentication so that it can be used alongside natural language self- service solutions to create conversational self-service and call steering without the typical awkward security steps,” said SymNex Consulting’s Smallman.

Systems have improved so they could, with as little as half a second of information, achieve a 99 percent accuracy rate, according to Nuance’s Beranek.

Yet criminals try to avoid detection by giving one-word answers or remaining silent so that the system or agent introduces noise. The process is similar to a partial fingerprint, which lacks sufficient markings to make an ID.

Too Much Noise

Historically, speech recognition was not applicable in all environments. These solutions often required quiet areas because any outside noise interfered with authentication. Noise-canceling solutions have made it possible for these systems to deal with such problems much better than in the past. “I turn the TV or speaker up really loud whenever I do a demo, so customers understand how much background noise voice biometric systems can tolerate,” Beranek points out.

Also, crooks try to confuse the system by inputting more than one voice during enrollment or authentication.

Line quality is a related issue. Voice biometric systems do not work with all network connections. Low network quality hinders the system’s ability to collect input or verify a voiceprint.

Voiceprints Can Be Moving Targets

People’s bodies do not remain static during their lives and can impact voice biometric system reliability. Typically, voice changes occur most dramatically in adolescence, and that period is where vendors struggle the most. However, they do feel confident that their systems correctly identify someone during their adult life.

In addition, colds, sore throats, or changes in voice, accents, or speech patterns impact some systems. “If a person has laryngitis, then the voice biometric system will have trouble matching the voiceprint,” Beranek concedes.

When a Caller’s Voice Is Disguised

Compounding the challenges, fraudsters developed a few novel ways to try and trick the system. The first is synthetic speech. Here, they ironically rely on voice technology to make human-sounding counterfeits. In response, vendors have been tuning their AI algorithms so they are smart enough to distinguish an artificially generated string of audio from genuine input.

The University of Eastern Finland found another vulnerability, one more low-tech than high-tech. Here, human impersonators—namely, skilled professionals from the entertainment or other industries who have years of experience re-creating voice characteristics and speech behavioral patterns—mimic another person’s speech. The pros modify their voices to fake their ages, sounding like an elderly person or a child.

In addition, a new ruse, a form of voice phishing, is emerging. It centers on public figures, like high-ranking company officials. In this case, the bad guys call into the company, pretend to be a top executive, and ask, for example, for a quick financial transaction, which goes to their bogus accounts. To guard against this ploy, a company would have to deploy voice biometric authentication for all of its incoming calls. Now these solutions are typically deployed only in the contact center and company help desks and focus more on verifying outsiders than employees.

How many problems the various ruses cause companies is debatable. “Despite its media popularity, the ‘deepfake’ threats are far more theoretical than practical,” claims SymNex Consulting’s Smallman. “In fact, I am doubtful that even some of the most publicized examples actually use this technology.”


Another plus is new tools are emerging. Industry innovation in AI and anti-spoofing solutions created a few new ways for vendors to stymie the criminals.

Liveness detection is one area of interest. These products were designed just to fight the emerging spoofed biometric data. They try to determine when voice biometric input comes from a live person or from a machine. The algorithms are becoming more proficient in separating one from the other.

Continuous authentication is another fledgling concept. Typically, a company or contact center verifies the caller’s identity at the entry point. From then onwards, the assumption is that the person online is a legitimate user.

A similar deduction was made when computer system security emerged decades ago. At the time, vendors thought once they verified someone as they entered the network, no further security safeguards were needed. But the bad guys thrived because once they were past the initial security check, little to nothing stopped them from sifting through the cornucopia of confidential information that the company had created.

Recently, the computer industry recognized the limitations with that approach and developed a new mind-set. A zero-trust model recognizes that additional security is needed beyond the initial check to truly secure information. As a result, security checks are enacted at each stop along a transaction.

Continuous authentication takes a similar approach with voice biometrics. It is designed to help businesses ensure that once someone is authenticated, no one else takes control over the interaction. Continuous authentication changes the perspective of authentication from a one-time event to an ongoing process. This system constantly checks that the person on the call is indeed the same person who was authenticated perhaps just a few seconds ago.

The quest for voice biometrics suppliers to reach 100 percent authentication seems quixotic because of the technology shortcomings, its dynamic nature, and hackers’ resolve in finding ways to compromise these systems. So what can a company do to fully protect its sensitive data?

First, they must acknowledge not only voice biometrics’ weaknesses but also its strengths. The security option seems more effective for corporations and less cumbersome to users than legacy options, like password systems. So deploying it makes sense.

But these systems, as well as other options, probably will never become 100 percent effective. “One thing we’ve learned over the years is that it is never wise to rely on only a single authentication factor, and that applies to voice,” notes Opus Research’s Miller. “In the context of a multifactor authentication approach, voice is extremely effective, but it is statistical and probabilistic. When a system is not confident enough to accept a claimed identity, other factors or processes need to be applied.”

In essence, the decision as to when, how, and how much to rely on voice biometrics as a security check becomes one where companies must determine the level of risk they are willing to take on. Systems today have attained high levels of success but are not fully foolproof. In a growing number of cases, accuracy rates and use cases are high enough to warrant adoption. However, system users must recognize that there will always be at least a chance that hackers will compromise the system and architect their security processes with that understanding. 

Paul Korzeniowski is a freelance writer who specializes in technology issues. He has been covering speech technology issues for more than two decades, is based in Sudbury, Mass., and can be reached at paulkorzen@aol.com or on Twitter @PaulKorzeniowski.

CRM Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues