LingLang Lunch (10/19/2016): Matt Masapollo (Brown University)

On the nature of the natural referent vowel bias

Considerable research on cross-language speech perception has shown that that perceivers (both adult and infant) are universally biased toward the extremes of articulatory/acoustic vowel space (peripheral in F1/F2 vowel space; Polka & Bohn, 2003, 2011). Much of the evidence for this bias comes from studies showing that perceivers consistently discriminate vowels in an asymmetric manner. More precisely, perceivers perform better at detecting a change from a relatively less (e.g., /e/) to a relatively more peripheral vowel (e.g., /i/), compared to the same change presented in the reverse direction. Although the existence of this perceptual phenomenon (i.e., the natural referent vowel [NRV] bias) is well established, the processes that underlie it remain poorly understood. One account of the NRV bias, which derives from the Dispersion–Focalization Theory (Schwartz et al., 2005), is that extreme vocalic articulations give rise to acoustic vowel signals that exhibit increased spectral salience due to formant frequency convergence, or “focalization.” In this talk, I will present a series of experiments aimed at assessing whether adult perceivers are indeed sensitive to differences in formant proximity while discriminating vowel stimuli that fall within a given category, and, if so, whether that sensitivity is attributable to general properties of auditory processing, or to phonetic processes that extract articulatory information available across sensory modalities. In Experiment 1, English- and French-speaking perceivers showed directional asymmetries consistent with the focalization account as they attempted to discriminate synthetic /u/ variants that systematically differed in their peripherality and hence degree of formant proximity (between F1 and F2). In Experiment 2, similar directional effects were found when English- and French-speaking perceivers attempted to discriminate natural /u/ productions that differed in their articulatory peripherality when only acoustic-phonetic or only visual-phonetic information was present. Experiment 3 investigated whether and how the integration of acoustic and visual speech cues influences the effects documented in Experiment 2. When acoustic and visual cues were phonetically-congruent, an NRV bias was observed. In contrast, when acoustic and visual cues were phonetically-incongruent, this bias was disrupted, confirming that both sensory channels shape this bias in bimodal auditory-visual vowel perception. Collectively, these findings suggest that perceivers are universally biased to attend to extreme vocalic gestures specified optically, in terms of articulatory kinematic patterns, as well as acoustically, in terms of formant convergence patterns. A complete understanding of this bias is not only important to speech perception theories, but provides a critical basis for the study of phonetic development as well as the perceptual factors that may constrain vowel inventories across languages.