Speaker identification is becoming an increasingly popular technology in today's society. Besides being cost effective and producing a strong return on investment in all the defined business cases, speaker identification lends itself well to a variety of uses and implementations. These implementations can range from corridor security to safer driving to increased productivity. By focusing on the technology and companies that drive today's voice recognition and identification systems, we can learn current implementations and predict future trends.
In this paper one-dimensional discrete cosine transform (DCT) is used as a feature extractor to reduce signal information redundancy and transfer the sampled human speech signal from time domain to frequency domain. Only a subset of these coefficients, which have large magnitude, are selected. These coefficients are necessary to save the most important information of the speech signal, which are enough to recognize the original speech signal, and then these coefficients are normalized globally. The normalized coefficients are fed to a multilayer momentum backpropagation neural network for classification. The recognition rate can be very high by using a very small number of the coefficients which are enough to reflect the specifications of the speaker voice.
An artificial neural network ANN is learned to classify the voices of eight speakers, five voice samples for each speaker are used in the learning phase. The network is tested using other five different samples for the same speakers. During the learning phase many parameters are tested which are: the number of selected coefficients, number of hidden nodes and the value of the momentum parameter. In the testing phase the identification performance is computed for each value of the above parameters.