Multi-Microphone Speech Enhancement
Sharon Gannot
Bar-Ilan University, Israel
1:30pm Wednesday, 14 October 2015, ITE 325B, UMBC
Microphone array algorithms emerged in the early 1990s as viable solutions to speech processing problems. However, the adaptation of beamforming methods to speech processing is still an open issue. There are many difficulties which arise from the characteristics of the speech signal and the acoustic environment. The speech signal is a wide-band and non-stationary signal. Very long room impulse responses (RIRs), which are several thousands of taps long, may be attributed to multiple reflections of the sound source on objects in the enclosure. Moreover, due to the inevitable movements of both sources (speakers) and receivers (microphones), the room impulse responses become time-varying.
In this talk, we will focus on spatial processors, a.k.a, beamformers, based on the linearly constrained minimum variance (LCMV) criterion, and its special case, the minimum variance distortionless (MVDR) beamformer. We show that classical beamformers that merely take into account angular information (as reflected by the so-called beam-pattern), are too simplistic to fully address the intricate propagation regime of the sound source in reverberant environment. We will therefore reformulate the LCMV beamformer in the shorttime Fourier transform (STFT) domain and substitute the free-field steering vector by the entire acoustic transfer function (ATF). The corresponding relative transfer function (RTF) will be then introduced, and its applicability to the design of beamformers in reverberant environments will be discussed. We will then elaborate on several blind RTF estimation techniques, e.g. based on subspace analysis, that enable the implementation of all necessary beamformer’s blocks. Several applications of the powerful LCMV beamformer, e.g. speech enhancement, extraction of desired speakers in multiple competing speaker environment, and binaural processing, will then be presented.
We will conclude the talk with an overview of the emerging field of distributed algorithms for ad hoc microphone arrays, and discuss the advantages and challenges they raise. The presentation will be accompanied by audio clips demonstrating the capabilities of the introduced schemes.
Sharon Gannot received his B.Sc. degree (summa cum laude) from the Technion-Israel Institute of Technology, Haifa, Israel in 1986 and the M.Sc. (cum laude) and Ph.D. degrees from Tel-Aviv University, Israel in 1995 and 2000 respectively, all in Electrical Engineering. In 2001 he held a post-doctoral position at the department of Electrical Engineering (ESAT-SISTA) at K.U.Leuven, Belgium. In 2002-2003 he held a research and teaching position at the Faculty of Electrical Engineering, Technion-Israel Institute of Technology, Haifa, Israel. Currently, he is a Full Professor at the Faculty of Engineering, Bar-Ilan University, Israel, where he is heading the Speech and Signal Processing laboratory and the Signal Processing Track. Prof. Gannot is the recipient of Bar-Ilan University outstanding lecturer award for 2010 and 2014. Prof. Gannot has served as an Associate Editor of the EURASIP Journal of Advances in Signal Processing in 2003-2012, and as an Editor of several special issues on Multi-microphone Speech Processing of the same journal. He has also served as a Guest Editor of ELSEVIER Speech Communication and Signal Processing journals. Prof. Gannot has served as an Associate Editor of IEEE Transactions on Speech, Audio and Language Processing in 2009-2013. Currently, he is a Senior Area Chair of the same journal. He also serves as a reviewer of many IEEE journals and conferences. Prof. Gannot is a member of the Audio and Acoustic Signal Processing (AASP) technical committee of the IEEE since Jan., 2010. He is also a member of the Technical and Steering committee of the International Workshop on Acoustic Signal Enhancement (IWAENC) since 2005. He was the general co-chair of IWAENC held at Tel-Aviv, Israel in August 2010. Prof. Gannot has served as the general co-chair of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New-Paltz, NY, USA in October 2013. Prof. Gannot was selected (with colleagues) to present a tutorial sessions in ICASSP 2012, EUSIPCO 2012, ICASSP 2013 and EUSIPCO 2013. His research interests include multi-microphone speech processing and specifically distributed algorithms for ad hoc microphone arrays for noise reduction and speaker separation; machine learning methods in speech processing; dereverberation; single microphone speech enhancement and speaker localization and tracking.