Wednesday, March 20, 2013

Bernardo: "Expected Information as Expected Utility" (1979)

Following a suggestion by Dennis Lindley (1956), this paper suggests that the appropriate measure of the expected value of an experiment X with respect to a target parameter Θ is the mutual information between the two, that is,
I(X;Θ)  =  H(Θ) – H(X,Θ).
Bernado calls this the "expected useful information" contained in the experiment (§2).

Proper Scoring Rules

The paper also contains a uniqueness theorem about so-called proper scoring rules (§3–4).

A "scoring rule" is a scheme for rewarding an agent (the "scientist") who reports probability distributions to you. It may depend on the distribution and on the actual observed outcome. For instance, a feasible rule is to pay the scientist p(x) dollars for the density function p if in the event that x occurred.

That function, however, would under many common rationality assumptions give the scientist an incentive to misreport his or her actual probability estimates. We consequently define a "proper" scoring rule as one that is hack-proof in the sense that the best course of action under that rule is to report your actual probability estimates.

An example of a proper scoring rule is –log p(x), but apparently, there are others. Barnardo refers to Robert Buehler amd I. J. Good's papers in Foundations of Statistical Inference (1971) for further examples. Unfortunately, that book seems to be a bit difficult to get a hold of.

Nice and Proper Scoring Rules

The theorem that Bernardo proves is the following: The only proper scoring rule which are both smooth and local (as defined below) are functions of the logarithmic form
u(p,x)  =  a log p(x) + b(x)
where a is a constant, and b is a real-valued function on the sample space.

As a corollary, the scientist's optimal expected payoff is H(X) + B/a, where B is the average value of the function b under the scientist's subjective probabilities. It also follows that the the optimal course of action for the scientist under this scheme will be to provide the maximum amount of information that is consistent with his or her beliefs.

So what does "smooth" and "local" mean?

Bernardo doesn't define "smooth," but usually in real analysis, a smooth function is one that can be differentiated indefinitely often. However, Bernardo refers to the physics textbook by Harold and Bertha Jeffreys (1972) for a definition. I don't know if they use the word the same way.

A scoring rule u is "local" if the reward that the scientist receives in event the event of x only depends on x and on the probability that he or she assigned to x. In other words, a local scoring rule u can be rewritten in terms of a function v whose first argument is a probability rather than a probability distribution:
u(p,x)  =  v(w,x),
where w = q(x), the reported probabilityof x (which does not necessarily equal the actual subjective probability p).

How To Prove This Theorem

I haven't thought too hard about the proof, but here's the gist that I got out of it: First, you use the the method of Lagrange multipliers to show that the w-derivative of the function
v(w,x) p(x) dx  –  λ ( w dx – 1)
is zero for all w when a function q is optimal. You then conclude that q = p fulfills this condition, since u was assumed to be a proper scoring rule. You then have a differential equation on your hands, and you go on to discover that its only solutions are of the postulated form.

Thursday, March 14, 2013

Sereno, O'Donnell, and Rayner: "Eye Movements and Lexical Ambiguity Resolution" (2006)

In the literature on word comprehension, some studies have found that people usually take quite a long time looking at an ambiguous word if it occurs in a context that strongly favors one of its less frequent meanings.

This paper raises the issue of whether this is mainly because of clash between the high contextual fit and the low frequency, or mainly because of the frequency.

The Needle-in-a-Haystack Effect

A context preceding a word can either be neutral or biased, and a meaning of an ambiguous word can either be dominant (more frequent) or subordinate (less frequent). When a biased context favors the subordinate meaning, it is called a subordinate-biasing context.

The subordinate-bias effect is the phenomenon that people spend more time looking at an ambiguous word in a subordinate-biasing context than they take looking at an unambiguous word in the same context — given that the two words have the same frequency.

For instance, the word port can mean either "harbor" or "sweet wine," but the former is much more frequent than the latter. In this case, the subordinate-biasing effect is that people take longer to read the sentence
  • I decided to drink a glass of port
than the sentence
  • I decided to drink a glass of beer
This is true even though the words port and beer have almost equal frequencies (in the BNC, there are 3691 vs. 3179 occurrences of port vs. beer, respectively).

Balanced Meaning Frequencies = Balanced Reading Time

The question is whether these absolute word frequencies are the right thing to count, and Sereno, O'Donnell, and Rayner argue that they aren't. Instead, they suggest that it would be more fair to compare the sentence
  • I decided to drink a glass of port
to the sentence
  • I decided to drink a glass of rum
This is because port occurs in the meaning "sweet wine" approximately as often as the word rum occurs in absolute terms — i.e., much more rarely than beer. (A casual inspection of the frequencies of the phrases drink port/rum and a glass of port/rum seem to confirm the close match.)

What the Measurements Say

This means that you get three relevant conditions:
  1. one in which the target word is ambiguous, and in which its intended meaning is not the most frequent one;
  2. one in which the target word has the same absolute frequency as the ambiguous word;
  3. and one in which the target word has the same absolute frequency as the intended meaning of the ambiguous word.
Each of these are then associated with an average reading time:


It's not like the effect is overwhelming, but here's what you see: The easiest thing to read is a high-frequent word with only a single meaning (middle row); the most difficult thing to read is a low-frequent word with only a single meaning (top row).

Between these two things in terms of reading time, you find the ambiguous word whose meaning was consistent with the context, but whose absolute frequency was higher.

Why are Ambiguous Words Easier?

In the conclusion of the paper, Sereno, O'Donnell, and Rayner speculate a bit about the possible causes of this "reverse subordinate-biasing effect," but they don't seem to find an explanation they are happy about (p. 345).

It seems to me that one would have to look closer at the sentences to find the correct answer. For instance, consider the following incomplete sentence:
  • She spent hours organizing the information on the computer into a _________
If you had to bet, how much money would you put on table, paper, and graph, respectively? If you would put more money on table than on graph, that probably also means that you were already anticipating seeing the word table in its "figure" meaning when your eyes reached the blank in the end of the sentence.

If people in general have such informed expectations, then that would explain why they are faster at retrieving the correct meaning of the anticipated word than they are at comprehending an unexpected word. But checking whether this is in fact the case would require a more careful information-theoretic study of the materials used in the experiment.

Eviatar and Just: "Brain correlates of discourse processing" (2006)

This paper shows that three different kind of text snippets lead to three different patterns of brain activity. This is interpreted as showing how literal, metaphorical, and ironic language is processed.

However, a closer look at the experimental materials show that these labels should be approached with some caution. The literal statements are not all "literal" by the standards of cognitive metaphor theory, and the metaphorical statements are in many cases not as conventional as they are claimed to be.

Are the Literal Sentences Literal?

Here are some examples of text snippets that Eviatar and Just categorized as "literal," with underscores added by me:
  • Betsy and Mary were on the basketball team. Mary scored a lot of points in the game. Betsy said, “Mary is a great player.”
  • Harry waited in line for 3 h to see the movie. He enjoyed himself. He said, “That was worth waiting for.”
  • George promised to be quiet in the library. He sat in a corner looking at a book. His dad said, “Thanks for keeping your promise.”
  • Laura was out sick for a week. Johnny called her every day. Laura said, “Thanks for worrying about me.”
  • Betty and Laura were in the same class. Laura finished her homework before Betty. Laura said, “You sure are a slow worker.”
Several of these should not be categorized as literal according to the standards of cognitive metaphor theory.

For instance, great is typically taken as an example of a metaphor with size as its source domain. Similarly, worth, keeping, and perhaps worrying are here used in senses that are not their most "basic" ones. Further, slow worker should probably be categorized as a metonymy.

Are the Metaphorical Sentences Conventional?

Here are some examples that they consider to be "frozen" metaphors (p. 2350):
  • In the morning John came to work early. He started to work right away at a fast pace. His boss said, “John is a hurricane.”
  • Mary got straight A’s on her report card. Her parents were proud of her. They said, “You are as sharp as a razor.”
  • Susie helped her mom when her brother got sick. She took good care of him. Her mom said, “You are an angel from heaven.”
  • Donna was always late for everything. Today she made it home on time for supper. Her dad said, “You have turned over a new leaf.”
  • George went to Betty’s birthday party. Fifty people crowded into her small apartment. He said, “I feel like a sardine.”
  • Betty and Laura were in the same class. Laura finished her homework before Betty. Laura said, “You work like a snail.”
No doubt that these examples constitute metaphors; it's only that they are a very different kind of metaphors than unemployment is growing or I'll handle the press. One is very overt and almost cries out for attention, while the other is quiet and discreet.

Some Factors Causing the Metaphors to Grab Attention

We can hardly expect a sentence like
  • You are an angel from heaven
to be processed the same way as
  • The chocolate cake was divine
The resonance of angel with heaven, the quotation marks in the original story, and the syntactic clumsiness of the sentences all contribute to provoke a very vivid mental picture in the reader. This would probably not be the case for the divine chocolate cake.

Other sentences from the materials provoke such images because they are not really conventional, or occur in an abnormal form. For instance, the common "sardine" metaphor virtually always occurs in the plural form like sardines and never in the form like a sardine.

Lastly, because the subject-predicate form in sentences like You are X or John is X is so semantically weak in its subject, it draws a lot of attention to the metaphor which is its topic. Compare for instance
  • He was a tallish man with a mind as sharp as a razor. (BNC)
  • You are a razor.
To my intuitions, the razor in the first sentence seems to recede very much into the background, while in the second sentence, it is being put forward as the explicit topic of the sentence. The reasons are probably both syntactic (information is spread unevenly over constituents) and semantic (the first sentence contains more competing content words).

Are the Metaphors Metaphorical?

One last thing that I should mention is that at least one of the "metaphorical" examples could be interpreted as literal:
  • Ken was worried about having his hair cut. When the barber finished, Ken’s ears stuck out. He said, “You’ve turned me into a clown.”
Without a more specific theory of what "metaphor" means, this is a very problematic borderline case.

So What?

This doesn't mean that the data that Eviatar and Just has collected is useless, or that it should be discarded. But it does mean that, once again, the notion of "metaphor" is so vague and contested that it can't just be transplanted from one field to another without some problems.

In particular, linguists should be careful not to take evidence from brain scans at face value; we need to look carefully into the details of what the neuroscience actually shows, and be explicit about what our own semantic theory actually say.

In this case, that means the following: First, some sentences provoke mental imagery more strongly than others; this is a product of several interacting factors, including word frequencies and resonance effects. And second, it is not at all clear what the relation between mental imagery and meaning is, and this relation cannot be made clear unless we come clean about what we thing linguistic meaning is.