The task of automatic speaker recognition is a classical example of a
pattern recognition problem, which in general finds some kind
of patterns within some real-world sensor data.
For all problems of pattern recognition, a training phase
is required. For the example of a speaker authentication system,
valid users of the system need to be enrolled.
During the enrollment procedure, the system "learns" the person it is supposed
to recognize. Speech samples of the user are required for this training
phase.
During the later recognition process, the system compares another recorded
speech signal (called test data) to the training utterance(s).
The desired output of the system is the name of one of the training
speakers, or a rejection if the test utterance stems from an unknown
person.
There are a few special problems within the field of speaker recognition:
Open Set - Closed Set
These terms refer to the set of trained speakers of the system.
If the system is provided with the information that all possible
test utterances belong to one of the speakers that have been learned
by the system, we have a "closed set" of training speakers. If a
test utterance may be originating by a person that has not been
shown to the system before, we speak of an "open set" of speakers.
The system should be able to make a rejection in this case.
Identification - Verification
A speaker identification system gets a test utterance
as input. The task of the system is to find out which of the
training speakers made the test utterance. So, the output of
the system is the name of the training speaker, or possibly
a rejection if the utterance has been made by an unknown person.
For a system which does a verification of an utterance,
the input is the speech signal to be verified as well as the
name of the trained speaker who is to be verified. The expected
result is a yes-or-no-decision: The acceptance of the test utterance
if it does originate from the proclaimed speaker, or a rejection.
A verification system answers the question: "Are you who you claim
to be?"
Text Dependent - Text Independent Recognition
A text independent speaker recognition system does not have any
information about the content of training and test utterances.
On the contrary, a text dependent system relies on the restriction
that the text that is said in training is identical to the test
utterance.
While the expression pairs open / closed set and identification / verification
respectively can be used for other biometric authentication methods like, for example,
iris, face or fingerprint as well, text dependent or independent recognition is obviously
a specific characterization of a speaker recognition system.