There are about three areas where speaker recognition techniques can
be of use. These are authentication, surveillance and forensic speaker recognition.
Speaker Recognition for Authentication
Speaker recognition for authentication allows the
users to identify themselves using nothing but their voices.
This can be much more convenient than traditional means of
authentication which require to carry a key with you or remember
a PIN.
There are a few distinct concepts of using the human voice
for authentication, i.e. there are different kinds of
speaker recognition systems for authentication purposes:
Single pass phrase system
A single pass phrase system lets the user chose a phrase that
is uttered in enrollment as well as for authentication.
Therefore, text dependent speaker recognition techniques can
be used, which has the advantage that a good recognition accuracy can
be achieved with very little speech data in training as well
as test. For a replay-attack of a system of this kind,
an intruder needs a recording of the correct pass phrase uttered
by the corresponding trained user of the system. If the user
keeps his pass phrase secret, such a recording is difficult
to obtain for the intruder unless he has the possibility
to "steal" the user's voice during the training or authentication
process.
An example for a single pass phrase system is the "BioID" system
of the HumanScan GmbH. This authentication solution further
enhances security against replay-attacks and overall recognition
accuracy by combining speaker authentication with other biometric
traits like face, iris and lip-movement.
Text prompt system
A text prompt system requires the user to utter a specific text
which is generated individually for each authentication. As an
example, a series of digits from "zero" to "nine" may be used.
But also the generation of arbitrary phrases which are to be
spoken by the person to be authenticated is conceivable.
Depending on the kind of prompt, the speaker recognition technique
may be text dependent as well as text independent.
The use of prompts makes replay-attacks more difficult, as the
recording of a single pass phrase as above is not sufficient to
gain illegal access to the system.
The disadvantage, however, is that longer speech signals have to
be collected during training as well as for the authentication
process, making the system not as convenient as the single pass
phrase approach.
Speaker verification integrated within a dialog system
If biometric authentication is desired in combination with a
dialog system that performs automatic speech recognition,
a third kind of speaker authentication system may be the most useful.
In this case, the utterances of the user which are made in order
to provide some kind of information to the system can be used for
the authentication purpose as well. For example, for a banking
application over the phone which is handled by an automatic
dialog system, it may be necessary to provide account information
of the caller. The speech recognition part of the system first
recognizes the number that has been specified. The same speech
signal is then used by the speaker authentication part of the
system to check if the biometric template of the account holder
matches the voice characteristic.
Speaker Recognition for Surveillance
Security agencies have several means of collecting information.
One of these is electronic eavesdropping of telephone and
radio conversations. As this results in high quantities
of data, filter mechanisms must be applied in order to find
the relevant information. One of these filters may be the
recognition of target speakers that are of interest for
the service.
Forensic Speaker Recognition
Proving the identity of a recorded voice can help
to convict a criminal or discharge an innocent in court.
Although this task is probably not performed by a
completely automatic speaker recognition system,
signal processing techniques can be of use in
this field nevertheless.
More information about this kind of application can be
found at the
University of Trier.