Tag Archives: Speech perception

LingLang Lunch (4/2/2014): Sheila Blumstein (Brown University)

Variability and Invariance in Speech and Lexical Processing: Evidence from Aphasia and Functional Neuroimaging

The processes underlying both speaking and understanding appear to be easy and seamless. And yet, speech input is highly variable, the lexical form of a word shares its sound shape with many other words in the lexicon, and often a given word has multiple meanings. The goal of this research is to examine how and in what ways the neural system is, on the one hand, sensitive to the variability in the speech and lexical processing system, and, on the other, how it is able to resolve this variability. To this end, we will review recent research investigating how the perceptual system resolves variability in selecting the appropriate word from its competitors and determining what category a sound belongs to, e.g. [d] or [t], and how different acoustic features of sounds, e.g. [d-t] vs. [s-z], map on to a common abstract feature, e.g. voicing. We will then examine how higher level information sources such as semantic and conceptual information are used in perceiving degraded speech. The implications of these findings will be considered for models of the functional and neural architecture of language.

LingLang Lunch (10/1/2014): Sara Guediche (Brown University)

Flexible and adaptive processes in speech perception
<divThe perception of speech depends on mapping a highly variable and complex acoustic signal onto meaningful sounds and words. Yet, listeners perform this task with seemingly little effort. Accurate perception relies on the integration of both the acoustic speech signal as well as other sources of information derived from the context; identical sounds (e.g., ambiguous phonetic categories) can be heard differently depending on the context (e.g., lexical information). Perception is not only flexible enough to accommodate distortions in the speech signal but can also adapt to accommodate systematic distortions and deviations in the acoustic speech signal with exposure; for example, an unintelligible speaker with a strong foreign accent can become better understood over time. How does perception maintain such flexible and adaptive processing without affecting the stable long-term speech representations? I will present a few studies in which we examined the influence of different sources of information on perception and adaptive plasticity in order to gain insight into this question.

LingLang Lunch (3/9/2016): Emily Myers (University of Connecticut)

Non-Native Speech Sound Learning: Studies of Sleep, Brain, and Behavior

Speech perception is subject to critical/sensitive period effects, such that acquisition of non-native (L2) speech sounds is far more difficult in adulthood than in childhood. Although adults can be trained to perceive differences among speech sounds that are not part of their native language, success is (1) variable across individuals, (2) variable across specific sounds to be learned, and (3) training may or may not generalize to untrained instances. Any theory of L2 speech perception must explain these three phenomena. Accounts of the L2 speech learning process have drawn from traditions in linguistics, psychology, and neuroscience, yet a full description of the barriers to perceptual learning of L2 sounds remains elusive. New evidence from our lab suggests that training on non-native speech produces plastic effects in the brain regions involved in native-language perception, and that consolidation during sleep plays a large role in the degree to which training is maintained and generalizes to new talkers. Further, similar mechanisms may be at play when listeners learn to perceive non-standard tokens in the context of accented speech. Taken together, these findings suggest that speech perception is more plastic than critical period accounts would predict and that individual variability in brain structure and sleep behavior may predict some of the variability in ultimate L2 sound acquisition success.

LingLang Lunch (10/19/2016): Matt Masapollo (Brown University)

On the nature of the natural referent vowel bias

Considerable research on cross-language speech perception has shown that that perceivers (both adult and infant) are universally biased toward the extremes of articulatory/acoustic vowel space (peripheral in F1/F2 vowel space; Polka & Bohn, 2003, 2011). Much of the evidence for this bias comes from studies showing that perceivers consistently discriminate vowels in an asymmetric manner. More precisely, perceivers perform better at detecting a change from a relatively less (e.g., /e/) to a relatively more peripheral vowel (e.g., /i/), compared to the same change presented in the reverse direction. Although the existence of this perceptual phenomenon (i.e., the natural referent vowel [NRV] bias) is well established, the processes that underlie it remain poorly understood. One account of the NRV bias, which derives from the Dispersion–Focalization Theory (Schwartz et al., 2005), is that extreme vocalic articulations give rise to acoustic vowel signals that exhibit increased spectral salience due to formant frequency convergence, or “focalization.” In this talk, I will present a series of experiments aimed at assessing whether adult perceivers are indeed sensitive to differences in formant proximity while discriminating vowel stimuli that fall within a given category, and, if so, whether that sensitivity is attributable to general properties of auditory processing, or to phonetic processes that extract articulatory information available across sensory modalities. In Experiment 1, English- and French-speaking perceivers showed directional asymmetries consistent with the focalization account as they attempted to discriminate synthetic /u/ variants that systematically differed in their peripherality and hence degree of formant proximity (between F1 and F2). In Experiment 2, similar directional effects were found when English- and French-speaking perceivers attempted to discriminate natural /u/ productions that differed in their articulatory peripherality when only acoustic-phonetic or only visual-phonetic information was present. Experiment 3 investigated whether and how the integration of acoustic and visual speech cues influences the effects documented in Experiment 2. When acoustic and visual cues were phonetically-congruent, an NRV bias was observed. In contrast, when acoustic and visual cues were phonetically-incongruent, this bias was disrupted, confirming that both sensory channels shape this bias in bimodal auditory-visual vowel perception. Collectively, these findings suggest that perceivers are universally biased to attend to extreme vocalic gestures specified optically, in terms of articulatory kinematic patterns, as well as acoustically, in terms of formant convergence patterns. A complete understanding of this bias is not only important to speech perception theories, but provides a critical basis for the study of phonetic development as well as the perceptual factors that may constrain vowel inventories across languages.

LingLang Lunch (Lite) (3/7/2018): Kasia Hitczenko (University of Maryland)

Kasia Hitczenko is a graduate student from the University of Maryland. Her research focuses on infants’ acquisition of categories in language.
 

How to use context to disambiguate overlapping categories: The test case of Japanese vowel length

Infants learn the sound categories of their language and adults successfully process the sounds they hear, even though sound categories often overlap in their acoustics. Most researchers agree that listeners use context to disambiguate overlapping categories. However, they differ in their ideas about how context is used. One idea is that listeners normalize out the systematic effects of context from the acoustics of a sound. Another idea is that contextual information is itself an informative cue to category membership, providing top-down disambiguating information. These two ideas have been studied extensively in the literature, but they have mostly been studied using synthesized or carefully controlled lab speech. In this talk, we contrast the efficacy of these two strategies on spontaneous speech, by applying them to the test case of Japanese vowel length. We find that normalizing out contextual variability from the acoustics does not improve categorization, but using context in a top-down fashion does so substantially. This calls into question the role of normalization in phonetic acquisition and processing and suggests that approaches that make use of top-down contextual information are more promising to pursue.