Voice command is an interface which is seeping in to everyday use and there seems to be high demand for it. In the build up to Christmas, Amazon announced that its Echo sold out online in the US as consumers clamoured to get their hands on the voice-activated speakers.
From a business perspective, we’ve seen banks lead the way in this area. In early 2016, we saw HSBC and First Direct launch voice recognition services to protect themselves against this growing threat. We have also more recently seen Santander implement its SmartBank app, an application which in its second phase will allow customers to make payments, report lost cards and set up accounts all through voice activation.
This has played a part in high profile technology investor Mary Meeker dedicating a large proportion of her annual report on the State of the Internet to the lift off of the voice interface. Voice UI may have been around for decades but the improved accuracy of the technology has raised its profile and increased consumer usage. However, for it to become commonplace across all industries and for more banks to implement it, voice biometrics need to be seamless, robust and secure.
A seamless user experience
It’s unfortunate that a large proportion of security systems tend to be obtrusive; take airport screening lines for example, that must be navigated in order to board a flight. It’s an added frustration because the vast majority of passengers are harmless, but have to pay the security toll so the few malicious ones can be detected. Perhaps the biggest promise of biometrics is seamless security. This is especially true of voice biometrics since voice command is increasingly becoming an embedded part of consumer life.
The mechanics of voice biometric systems require users to go through an enrolment process so it can effectively ‘learn’ the unique ‘thumbprint’ of their voice. Enrolment can be active or passive. Active enrolment must be used with systems that are text-dependent. This is a manual process where the user is required to speak an agreed upon phase. In addition to enrolment, this phase must be spoken every time the user is authenticated. Therefore, text-dependent systems fail to make security unobtrusive.
Passive enrolment occurs during the user’s normal interaction with the system. It is used by voice biometric systems that are text-independent. They are able to learn the user’s voice during normal speech and do not require a specific phrase. As a result, text-independent systems are truly unobtrusive to the user because they are almost completely invisible to them.
Make it robust
Voice biometrics must be robust to noise, channel and voice changes over time. For instance, background noise is a common occurrence on many phone calls. It is the non-speech sounds picked up by the microphone that include things such as the ping of a microwave, the whir of air conditioner, the radio playing or children talking. If a voice biometric system is not robust to this noise it can cause authentication to fail when the background noise is different than what was present during enrolment.
Given the fact that the human voice changes as we age, voice biometric authentication that cannot understand and adapt to this, will also fail. In order for voice biometric systems to last, they must continually learn and adapt to the user’s ever changing voice. However, they must be careful not to adapt too easily and learn from the voice of an imposter.
Security is key
Something at the forefront of consumers’ minds will be around what to do if voice biometrics are compromised. The beauty of this type of authentication is that unlike passwords, your voice cannot be changed. However, there are a number of ways that an attacker can steal a user’s voice. For example, websites such as Youtube have recorded audio that can be used for this purpose.
An attacker could also use speech synthesis to impersonate a user. Given enough audio, modern systems can build a voice that sounds very similar to the person being modelled. This cloned voice could not only be used during authentication, but to also carry on a conversation with the agent or system, potentially accessing or compromising more sensitive data.
Lastly, the attributes that are extracted from the user’s voice for enrolment and authentication can be stolen. They typically consist of a list of floating point numbers, calculated when a user’s speech is analysed. For instance, if the attributes are extracted on a user’s device, an attacker could steal the attributes. The stolen attributes would then be used for authentication instead of those derived from the attacker’s voice.
Can biometrics go at it alone?
Biometrics will only be trusted and widely adopted if they become seamless, robust and secure. However, when introducing new technologies, it is essential that all angles are covered. Without multiple layers of security covering all channels, from face-to-face verification, to online and over the telephone, fraudsters are able to manipulate particular points of exposure. For instance, biometrics is one way to solve the credential recovery issue, but it cannot detect fraud on its own.
More advanced technology that builds upon some of the security validation foundations of voice biometrics, but introduces multi-factor authentication is Phoneprinting™. The technology identifies specific components about each call such as the location a call is coming from, the device, whether it’s a mobile or landline and whether the phone has been used to call the company before. Combined, this can aid in detecting fraudulent activity before it becomes an issue, helping to keep businesses secure.
Matt Peachey is the vice president and international general manager of Pindrop.