ituu_logo
banner_right

Time-domain beamforming and convolutive blind source separation-Application to hands-free speech input in the car environment

Author: Julien Bourgeois

Month/Year: December 6, 2007

Supervisor Prof. Minker

Description:

The thesis addresses the problem of speech enhancement in the context of seamless speech input with simultaneous, concurrent speakers. Applications include hands-free phone calls as well as more advanced functions such as automatic dialog systems for in-vehicle navigation assistance systems. Before recognizing speech as a sequence of words, a necessary preprocessing step is to denoise the speech signal from its perturbations. This thesis addresses the issue of separating the desired signal from interfering speech.
Classical beamforming methods are based on the prior knowledge of the position of the target speaker. They require a control mechanism to prevent cancellation of the target signal. The filter adaptation is generally interrupted during periods of double-talk, which yields limited performance in the case of simultaneous speakers. Moreover, the tuning of decision thresholds may be difficult. In this thesis, we address the separation problem using uninterrupted adaptive algorithms that may be able to adapt continuously without requiring any decision threshold.
In the first part of the thesis, we propose a modification of the widely-used NLMS algorithm, termed Implicit LMS (ILMS), which implicitly includes an adaptation control and does not require any threshold. Experimental evaluations reveal that ILMS mitigates the target signal cancellation substantially with a distributed microphone array. However, in the more difficult case of a compact microphone array, it does not sufficiently reduce the target signal cancellation. In this case, more sophisticated blind source separation techniques (BSS) seem necessary.
The second part is dedicated to BSS algorithms. Second-order statistics BSS algorithms are presented following the time-domain approach by Buchner et al. [22]. Then, these algorithms are extended along two axis: Firstly, the concept of ‘partial blind source separation’ (PBSS) is introduced to apply natural gradient BSS algorithms with more microphones than sources. At a moderate computational cost, PBSS flexibly exploits all microphone signals and provides multiple interferer references. Secondly, we propose self-closed update rules for the separation of ‘causal’ and ‘acausal’ systems. These update rules emerge as the most robust ones in an experimental comparison. Also, an emphasis is placed on the theoretical study of BSS, evidencing the role of the causality of the mixing system. The global convergence is discussed in a simplified case and an analysis of the local stability gives an upper bound on the amount of cross-talk.
In the last part we combine the beamforming and BSS approaches. The car interior serves as a privileged test environment. While the input of geometrical prior information may increase the start-up performance, we find that the performance gain after the initial convergence is limited. The use of an adaptive interference canceller as a post-processing leads to a higher interference suppression, but also to a higher cancellation of the desired signal. Fortunately, the cancellation of the desired signal may be kept moderate by adequately combining BSS with the ILMS algorithm and geometrical prior information. Then the cancellation of the desired signal is uncritical and the resulting algorithms may be used as an efficient front-end in automatic speech recognition.

Presentation

Link to Document