Thursday, February 2, 2012

Cacciari and Tabossi: "The Comprehension of Idioms" (1988)

In this paper, Cristina Cacciari and Patrizia Tabossi argue that our idiomatic reading of a phrase like shoot the breeze is triggered by a "key." The process by which we understand such phrases can consequently differ depending on the position of the key.

Although they don't give any examples, they probably have string like take the bull..., a silver spoon..., and make a fool... in mind when they talk about "keys."

Experimental Evidence

They support this theory with three priming experiments. These experiments show that:
  • When the key is highly predictive of the idiom, and it occurs early in the phrase, the idiom is quickly and automatically activated, but the literal word meanings are not.
  • When the key is more ambiguous or occurs late (or both), the opposite occurs: The idiom is not activated, but the word meanings are. However, this presupposes that the decision task that tests the activation levels is posed immediately after the stimulus.
  • If one inserts a 300 ms delay before the decision task, though, both the idiom and the word meanings become active. This seems to suggest that the initial ambiguity associated with the late/weak key is resolved after the delay.

Recognizing Idioms From "Keys"

Cacciari and Tabossi don't formalize their notion of a "key," but it seems that it might be done in statistical terms. To illustrate that, take for instance the following idiomatic sentence:
  • It is best to take the bull by the horns
This sentence contains a number of incomplete left-segments:
  1. it ...
  2. it is ...
  3. its best ...
  4. it is best to ... (etc.)
As we consider bigger and bigger left-segments, it becomes more and more probable that these incomplete sentences are going to be completed as the full sentence it is best to take the bull by the horns. This can be estimated by looking at the number of occurrences of the segment vs. whole sentence.

By thus comparing the predictive strength of the left-segment, we can get a sense of where the the uncertainty tips into certainty:

The numbers behind this graph are based on Google searches. For instance, the bar above the word bull shows the number of hits for it is best to take the bull divided by the number of hits for the whole sentence. The 'keyness' is just the difference between this statistic for the relevant word and its neighbor to the left.

As the picture clearly shows, the big jump occurs with the word bull. So we can expect an average English speaker to switch into "idiom mode" upon reaching that point in the sentence.

Conditional Probability Is Not Redundancy

Note this is different from the redundancy of each completion.

That would be another relevant statistic to base the concept of "keyness" on, but it would also require an estimate of the number of completions of each of these left-segments and their probabilities.

That's not a completely crazy thing to try to estimate based on some kind of language model, but it does require some corpus data that I don't have at hand right now.

Opaque Idioms

I just want to give a list of some of the Italian idioms that Cacciari and Tabossi provide, because they are such great examples of how much uncertainty we face when interpreting unknown idioms. Here's a list of their most striking items:

  • He was born with the shirt (= born with a silver spoon in his mouth)
  • That would have done him the skin (= killed him)
  • He made a whole in the water (= did not succeed)
  • She did a job with the feet (= badly)
  • The project has gone to the mountain (= failed)
  • The assets have been given bottom (= been depleted)
  • She had the moon (= was in a bad mood)
  • He was at the green (= broke)
  • He made himself in four to succeed (= tried hard)
  • She was left of salt (= struck dumb)

