2aSC17. Effects of synthesis fidelity on vowel identification: Role of spectral change and voicing source.

Session: Tuesday Morning, Dec 04


Author: Peter F. Assmann
Location: School of Human Development and Callier Ctr. for Community Disord., The Univ. of Texas at Dallas, Box 830688, Richardson, TX 75083
Author: William F. Katz
Location: School of Human Development and Callier Ctr. for Community Disord., TheSchool of Human Development and Callier Ctr. for Community Disord., The Univ. of Texas at Dallas, Box 830688, Richardson, TX 75083

Abstract:

Recent studies have shown that synthesized versions of American English vowels are more accurately identified when natural time-varying changes in the formant frequencies are preserved rather than flattened. A limitation of these experiments is that vowels generated with cascade formant synthesis are generally less accurately identified than natural vowels. To overcome this limitation, a high-quality analysis--synthesis system was used to reexamine the effects of spectral change. Using this new technique, synthesized versions of 12 American English vowels spoken by adults and children were identified as accurately as natural vowels. Two experiments confirmed the beneficial effects of preserving the time-varying changes in the formants, both in vowels synthesized with pulsed excitation (as in voiced speech) and noise excitation (as in whispered speech), and verified that identification accuracy does not decline when the fundamental frequency is held constant. However, in contrast to earlier findings, (i) whispered vowels were identified as accurately as the voiced versions, and (ii) the benefits of time-varying spectral change were greater than previously found with cascade formant synthesis. The findings are consistent with recent studies showing that the measured effects of spectral and temporal manipulations in vowels can vary as a function of synthesis fidelity.