Tuesday, September 17, 2013

Geskell and Marslen-Wilson: "Lexical Ambiguity Resolution and Spoken Word Recognition" (2001)

If you pronounce the phrase
  • worn building
and you do this relatively quickly, there is a good chance that it comes out as something close to
  • ['wɔɹm'bɪldɪŋ]
instead of
  • ['wɔɹn'bɪldɪŋ].
This phenomenon is known as consonant assimilation. In this particular case, it happens because the peripheral consonant /b/ in the beginning of building makes it difficult to pronounce a coronal consonant such as the /n/ in worn, compared to another peripheral consonant such as the /m/ in warm. Your lips simply have to move less to say things like em-beh than to say things like en-beh.

The lip position used for an [m] is close to that used for a [b]
(picture from an online book by Michael Gasser)

In principle, this means that the sound string ['wɔɹm'bɪldɪŋ] involves an ambiguity for the hearer: Was the intended original message worn building or warm building? Both are plausible given the observed signal because they are both occasionally pronounced in the same way.

Assimilation effects can thus — in certain, relatively rare, cases — add another decoding problem to the already quite substantial ambiguity of words like warm.

Noisy Channel Hearing

The raises a question about the psycholinguistics of hearing: What would happen if we plugged a sound ambiguity like this into an experimental paradigm designed to track the process of word sense selection? Would people perhaps show traces of an active inference from sound to word, and competition between various hypotheses?

This is the question investigated by this paper by Gareth Gaskell and William Marslen-Wilson. Their idea is to play back an ambiguous sound string to their subjects, and then check if they are faster at recognizing a word on a computer screen if that word could have been the source of the sound string before consonant assimilation. For instance:
  • Voice: The ceremony was held in June and the sunny weather added to the air of celebration. An article about the bribe made the [Screen: bride] local paper.
  • Voice: The conditions in the outback were difficult for driving. In the intense heat, the mug cracked up [Screen: mud] completely.
  • Voice: We were impressed by her stylish delivery and intonation. Jane finished off the seam beautifully. [Screen: scene]
These test sentences are then compared to another condition in which the phonetics of the sentences do not warrant any backwards inference to a different sound form:
  • Voice: The ceremony was held in June and the sunny weather added to the air of celebration. An article about the bribe turned up [Screen: bride] in the local paper.
  • Voice: The conditions in the outback were difficult for driving. In the intense heat, the mug turned to [Screen: mud] dust.
  • Voice: We were impressed by her stylish delivery and intonation. Jane finished off the seam deftly. [Screen: scene]
These sentences cannot have come about by assimilation effects, so there is no basis for a reconstructive inference. For instance, bribe turned is not easier to pronounce than bride turned, so there is no reason to hypothesize that bribe as a distorted form of bride.

In the Face of Overwhelming Evidence

The main result of the whole paper is that there is indeed a significant difference between the cases where the phonological context supports an inference (e.g., mug cracked) and the cases where it doesn't (e.g., mug turned). This is, however, only the case if the discursive context also strongly suggests the same reconstructive inference (Experiment 3).

It's also worth noting that the effects are tiny. On average, subjects took 522 milliseconds to recognize the phonologically warranted form (e.g., mud from mug cracked) and 537 milliseconds to recognize the phonologically unwarranted (e.g. mud from mug turned). This is a difference of 15 milliseconds, or a drop of 2.8% in decision time. It's statistically significant, but it's not big.

They also found that if you remove the discursive bias from the materials, this effect disappears (Experiment 1 and 2). There is, for instance, no priming effect in the following sentence:
  • Voice: An article about the bribe turned up [Screen: bride] in the local paper.
It is thus only when both discursive and phonological context supports the inference that it leaves a measurable trace.

It is conceivable that there is an activation effect in the other case as well, but that it simply is so miniscule that we can't see it. But at any rate, this finding makes sense if we think about the inference as a kind of naive Bayes collection of evidence in favor of a hypothesis.

No comments :

Post a Comment