Compression of residual speech signals using wavelet transform

number: 
1247
English
Degree: 
Author: 
Sarmad Saad Ali Al-Baghdady
Supervisor: 
Dr. Manal J. Al-Kindi
year: 
2004

Abstract : Speech Communications were a major research area in signal processing domain since the first third of the last century. As digital telephony and digital communications are evolving and advancing in efficiency and services speech coding techniques were following it to match or to make use of the new features and conditions. The increase of services and users inducts for new coding algorithms that produce lower bit rates and thus reducing transmission band width, speech quality were sacrificed and degradation were presented and high quality speech communication were out of use for reasonable rates and coding may depends on speech information that are highly correlated in human brain perception. The most dominant coders in this field are utilizing human speech production mechanism and known as linear predictive coders. This work presents a new speech compression philosophy that models human speech production, hearing and perception to optimally code speech signals at acoustical and perceptual bases. These modeling concepts developed independently along the speech processing and biological mathematics (biomathematics) history. It was found in this work, that the linear prediction semi-flat spectrum residual signal can be compressed depending on the time varying model of speech signal represented by the vocal tract resonance frequencies (formants) this operation were further studied, and simulations proved that LP residual can be adaptively compressed depending on the masking phenomenon observed form psychoacoustical and critical bands analysis of the speech signals. But first, prediction residual signals are transformed using perfect reconstruction QMF filter bank utilizing a powerful compression transform represented by wavelet transform, and it were proved that it could be used to compress prediction residual, a new adaptive quantization approaches were proposed in both dynamic range adaptation and step size adaptation through a proposed adaptive statistical bit allocation that operate the adaptive quantizer optimally on the non-compressed transform coefficients. Also the use of wavelets as spectral analysis tool was investigated, and it proved to be a successful tool in that field. Wide range of speakers and speaking material were used to evaluate the performance of this new approach, 8k, 10k and 16k Hz sampling frequencies were used, the input signals are quantized using 16 bit uniform quanitizer to ensure best possible speech quality for the input speech signals.