Definitions

The task of automatic speaker recognition is a classical example of a pattern recognition problem, which in general finds some kind of patterns within some real-world sensor data. For all problems of pattern recognition, a training phase is required. For the example of a speaker authentication system, valid users of the system need to be enrolled. During the enrollment procedure, the system "learns" the person it is supposed to recognize. Speech samples of the user are required for this training phase.

During the later recognition process, the system compares another recorded speech signal (called test data) to the training utterance(s). The desired output of the system is the name of one of the training speakers, or a rejection if the test utterance stems from an unknown person.


There are a few special problems within the field of speaker recognition:
  • Open Set - Closed Set

    These terms refer to the set of trained speakers of the system. If the system is provided with the information that all possible test utterances belong to one of the speakers that have been learned by the system, we have a "closed set" of training speakers. If a test utterance may be originating by a person that has not been shown to the system before, we speak of an "open set" of speakers. The system should be able to make a rejection in this case.

  • Identification - Verification

    A speaker identification system gets a test utterance as input. The task of the system is to find out which of the training speakers made the test utterance. So, the output of the system is the name of the training speaker, or possibly a rejection if the utterance has been made by an unknown person.

    For a system which does a verification of an utterance, the input is the speech signal to be verified as well as the name of the trained speaker who is to be verified. The expected result is a yes-or-no-decision: The acceptance of the test utterance if it does originate from the proclaimed speaker, or a rejection. A verification system answers the question: "Are you who you claim to be?"

  • Text Dependent - Text Independent Recognition

    A text independent speaker recognition system does not have any information about the content of training and test utterances. On the contrary, a text dependent system relies on the restriction that the text that is said in training is identical to the test utterance.


While the expression pairs open / closed set and identification / verification respectively can be used for other biometric authentication methods like, for example, iris, face or fingerprint as well, text dependent or independent recognition is obviously a specific characterization of a speaker recognition system.