2pSC4. Active audition for humanoid robots that can listen to three simultaneous talkers.

Session: Tuesday Afternoon, Apr 29

Author: Hiroshi G. Okuno
Location: Dept. of Intelligence Sci. and Technol., Grad. School of Informatics, Kyoto Univ., Sakyo, Kyoto 606-8501, Japan, okuno@i.kyoto-u.ac.jp
Author: Kazuhiro Nakadai
Location: Kitano Symbiotic Systems Project, JST, M-31 6-31-15 Jingumae, Shibuya, Tokyo 150-0001, Japan


The direction-pass filter (DPF) separates sounds originating from a particular direction by using a pair of microphones embedded in each ear of humanoid robot. DPF first extracts harmonic structures from each channel, finds a corresponding pair on right and left channels, and then calculates their interaural phase difference (IPD) and interaural intensity difference (IID). These IPD and IID are matched with reference data obtained by HRTF or by the geometrical relation to determine the sound source direction. The direction obtained by face detection may be used as a candidate for the direction. Finally, all subbands from the direction are collected to synthesize a wave form by inverse FFT. The allowance of collection depends on the direction; narrow (10 deg) at center, while wide (30 deg) at the periphery. This property is called ``auditory fovea'' and is exploited by DPF actively to improve performance of sound source separation. In addition, a humanoid actively turns its head toward the speaker to listen better. Real-time DPF is implemented by distributed processing with five PCs. Preliminary experiments of active audition in speech recognition of three simultaneous utterances of digits in a normal room is also reported. [Work supported by JSPS.]