Applications

There are about three areas where speaker recognition techniques can be of use. These are authentication, surveillance and forensic speaker recognition.

Speaker Recognition for Authentication

Speaker recognition for authentication allows the users to identify themselves using nothing but their voices. This can be much more convenient than traditional means of authentication which require to carry a key with you or remember a PIN. There are a few distinct concepts of using the human voice for authentication, i.e. there are different kinds of speaker recognition systems for authentication purposes:
  • Single pass phrase system

    A single pass phrase system lets the user chose a phrase that is uttered in enrollment as well as for authentication. Therefore, text dependent speaker recognition techniques can be used, which has the advantage that a good recognition accuracy can be achieved with very little speech data in training as well as test. For a replay-attack of a system of this kind, an intruder needs a recording of the correct pass phrase uttered by the corresponding trained user of the system. If the user keeps his pass phrase secret, such a recording is difficult to obtain for the intruder unless he has the possibility to "steal" the user's voice during the training or authentication process.

    An example for a single pass phrase system is the "BioID" system of the HumanScan GmbH. This authentication solution further enhances security against replay-attacks and overall recognition accuracy by combining speaker authentication with other biometric traits like face, iris and lip-movement.

  • Text prompt system

    A text prompt system requires the user to utter a specific text which is generated individually for each authentication. As an example, a series of digits from "zero" to "nine" may be used. But also the generation of arbitrary phrases which are to be spoken by the person to be authenticated is conceivable. Depending on the kind of prompt, the speaker recognition technique may be text dependent as well as text independent.

    The use of prompts makes replay-attacks more difficult, as the recording of a single pass phrase as above is not sufficient to gain illegal access to the system. The disadvantage, however, is that longer speech signals have to be collected during training as well as for the authentication process, making the system not as convenient as the single pass phrase approach.

  • Speaker verification integrated within a dialog system

    If biometric authentication is desired in combination with a dialog system that performs automatic speech recognition, a third kind of speaker authentication system may be the most useful. In this case, the utterances of the user which are made in order to provide some kind of information to the system can be used for the authentication purpose as well. For example, for a banking application over the phone which is handled by an automatic dialog system, it may be necessary to provide account information of the caller. The speech recognition part of the system first recognizes the number that has been specified. The same speech signal is then used by the speaker authentication part of the system to check if the biometric template of the account holder matches the voice characteristic.

Speaker Recognition for Surveillance

Security agencies have several means of collecting information. One of these is electronic eavesdropping of telephone and radio conversations. As this results in high quantities of data, filter mechanisms must be applied in order to find the relevant information. One of these filters may be the recognition of target speakers that are of interest for the service.

Forensic Speaker Recognition

Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court. Although this task is probably not performed by a completely automatic speaker recognition system, signal processing techniques can be of use in this field nevertheless.

More information about this kind of application can be found at the University of Trier.