Statistical and wavelet approaches for speaker identification

number: 
780
إنجليزية
Degree: 
Author: 
Yasir Abdul-Mehdi Taleb
Supervisor: 
Dr.Abdul-Karim A-R. Kadhim
year: 
2002
Abstract:

Speaker identification applications are becoming more and more popular, because of the increasing demand for interactive services over the telephone network and the Internet, For example, telephone and Internet banking require higher levels of security. In this thesis, closed-set text-independent speaker identification using two different approaches with dimensionality reduction techniques was examined. The first approach is statistical using linear prediction based on features (Linear Prediction Coding, Partial correlation, Log-Area-Ratio, Cepstrnm, and Me I Frequency Cepslral Coefficients). The Auto-Regressive linear prediction model was adopted in this approach. The second approach is a phonemebased wavelet approach that uses the energy distribution in a wavelet filter bank as features for identification. The use of phonemes allows the isolation of specific acoustic features and events related to the phoneme used. Different structures for the filter bank were used, e.g. Wavelet Packet Transform and Discrete Wavelet Transform. Different wavelets from the Daubechies wavelet family (db4, db(>, db8, and clblO) were used for the filler bank. The performance of the statistical approach was tested using random speech passages produced by 108 speakers (52 male/56 female) from the IViL7 speech corpus. The second approach was tested using short segments extracted from the vowels /a/, /i/, l\\l produced by 20 speakers (10 male/10 female) from the Normal folder in the Mat lab Speech Toolbox Database.For the statistical approach an identification rate of 100% was obtained up to 30 speakers. This rale is also obtained for higher number of speakers (up to 36) when Linear Discriminant Analysis is used. For all the 108 speakers 88% was obtained with and without Linear Discriminant Analysis. For the wavelet approach an identification rate of 100% was obtained with small number of speakers (below 5) under the best conditions. For 20 speakers the rate is about 80%, and it is increased to 85% using a proposed adaptive feature subset selection technique, which was named Band Selection Wavelet Packet Transform (BSWPT).