Thursday, May 23, 2013

McGlone: "What is the explanatory value of conceptual metaphor" (2007)

Matthew McGlone is one of the staunchest critics of cognitive metaphor theory, and in this 2007 paper, he reiterates a number of the arguments he has given in the past.

These arguments are roughly the following:

1

If our knowledge of abstract objects (like social relationships) were really derived from our knowledge of concrete objects (like heat, distance, and containers), then we would not be able to see the difference between the two (p. 114). This is a problem also pointed out by by Gregory Murphy (1996).

More generally, the stronger the claim of cognitive metaphor theory becomes, the weirder it becomes that we can distinguish, say, theories from actual buildings. In an interesting spin on this, McGlone considers the sentence

  • My recent trip to L.A. was a rollercoaster ride (p. 122)

This is a conceptualization of a journey in terms of a journey—and so, it seems weird that we can recognize it as a metaphor, and that we don't draw literal rollercoaster inferences about the journey from this sentence.

2

He takes Keysar and Bly's 1995 study on false etymologies as evidence for the fact that our sense of linguistic coherence may be "illusory" (p. 115). He further supports this fact by citing other folk etymologies that are known to be false.

3

All the very good arguments made in his own 1996 paper on the topic are repeated (p. 117). This relates to paraphrase, reading speed, and recall.

4

The aptness data from Nayak and Gibbs (1990) and can be explained as an effect of stylistic coherence rather than cognitive facilitiation. Further, when Glucksberg et al. (1993) tested the stories for actual reading facilitation effects, they found none. In this connection, he also considers and rejects the recall data from Albritton et al. (1995) because they failed to control for lexical priming.

5

On a slightly different note, his own experiment in McGlone and Harding (1998) found that people read the ambiguous sentence the meeting was moved forward in a way that is consistent with the metaphor used in the preceding discourse (p. 120):
  • We passed the deadline two days ago. The meeting originally scheduled for next Wednesday has been moved forward two days [i.e., to Friday].
  • The deadline passed two days ago. The meeting originally scheduled for next Wednesday has been moved forward two days [i.e., to Monday].
 This is consistent with cognitive metaphor theory, but also with the theory that there is a more abstract super-structure which subsumes both time and space. He finds this latter suggestion "epistemologically more plausible" and suggests that perhaps the common features of time and space are "simply more transparent in spatial language than in other linguistic domains" (with a reference to Talmy 1996).

6

Lastly, he cites the reading time study by Keysar et al. (2000) which found that there was no significant difference between the facilitation effect of conventional and non-conventional metaphors.

Friday, May 17, 2013

Ravi and Knight: "Bayesian Inference for Zodiac and Other Homophonic Ciphers" (2011)

A homophonic cipher is a one-to-many encryption scheme — that is, it can substitute a single plaintext letter with several different cipher symbols. The advantage of using such one-to-many mappings (rather than one-to-one substitution ciphers) is that they give a flatter output distribution when the number of cypher symbols per plaintext letter is proportional to the frequency of that letter.

Yet, they are not perfect codes; although the monogram frequencies in the ciphertext can be close to uniform, the skewed distribution of various cipher bigrams and trigrams constrain the possible encryption schemes that are likely to have been used. If you can find an intelligent way of navigating through the space of possible encryption schemes, such schemes may thus still be cracked.

Bayesian Deciphering

This is what Sujith Ravi and Kevin Knight do in this paper. They have a quite straightforward model of the English language and a not quite so straightforward model of homophonic encryption, and they use these two models to compute the posterior probability of a plaintext string given the cipher string. However, since the space of possible encryption schemes is astronomical, they need to select their hypotheses in a clever way.

The technique they apply to accomplish this is Gibbs sampling — that is, they repeatedly resample one dimension of their current position at a time. In the current context, this means conjecturing a new plaintext letter as the preimage of a specific cipher symbol while keeping all other hypotheses constant.

Because the surrounding text decrypted per assumption, different conjectures will have different posterior probabilities, determined by the prior probability of the plaintext strings that they correspond to. Walking around on the space of encryption schemes this way, the model will spend most of its time at places where the plaintext hypotheses have high probability (i.e., good "Englishness").

Cut-and-Paste Resampling

There is a further technical quirk of their model which I'm not quite sure how they implemented: They state (p. 244) that when they resample the preimage of some cipher symbol, they snip out all the windows containing the relevant cipher symbol and glue it to the end of the cipher instead.

If I understand this correctly, it means this:

Suppose your cipher is IFMMPSPQME and your current decryption hypothesis is hellewerld. You then resample the cipher symbol P, perhaps selecting o as its preimage (instead of e).

Since P occurs in the two contexts IFMMPSPQME and IFMMPSPQME, you thus snip these out of the cryptogram, leaving IFM ME (whitespace only included for readability).

Pasting the two cut-outs at the end, you obtain IFM ME MPS SPQ. You then evaluate the posterior probability of this hypothesis by asking your language model for the probability of helldlowwor.

In fact, the idea is a little more complicated than this (and not quite as unreasonable), as the window sizes are determined by possible word boundaries. In the example above, a much larger window might in fact be snipped out, since the algorithm would plausibly recognize some word boundaries around helle and werld, or around hello and world (I don't know exactly when they decide on where the word boundaries go).

The Language and Channel Models

The language model that Ravi and Knight use is slightly unusual in two ways:
  1. It gradually adapts to the bigrams already seen in the plaintext. Letters towards the end of the hypothesized plaintext source are thus selected according to what happened in the beginning of the text (according to the plaintext hypothesis).
  2. It combines a model based on word frequencies (with 90% weight) and a model based on n-gram frequencies (with 10% weight).
With respect to the latter point, I suppose they must be using some kind of quite generous smoothing; the Zodiac cipher they crack has several spelling mistakes and contains 8 non-existing words out of 100.

I also don't know how they decide where to put the word boundaries, but this is a problem that can be solved efficiently with dynamic programming techniques. As they comment (p. 245), the n-gram model is going to do most of the discriminatory work in the beginning (when the encryption hypothesis is still largely random), but as the hypotheses get more and more accurate, the word-based model will start to drive the sampling to a higher extent.

Internal Training

The adaptive part of the language model is expressed by the fraction on the bottom of page 242. This fraction can be split into two parts,
  1. a bigram model trained on an external corpus;
  2. a bigram model trained on the left-hand part of the plaintext hypothesis.
These two models are given weights a/(a + k) and k/(a + k), respectively, where k counts the occurrences of the preceding letter in the hypothesized plaintext string left of the current position; a is a hyperparamter which Ravi and Knight set to a = 10,000.

Thus, as k increases the corpus model is given progressively less weight, and the cipher model is given progressively more. In other words, the more decrypted data we have, the more we trust the cipher model. Since a is as high as it is, the internal model never gets anywhere near outweighing the external.

I'm actually not quite sure whether the "left of" part in k makes any difference, since they glue the windows around the resampled cipher symbol onto the end of the cipher anyway. But maybe I'm missing something.

Sunday, May 12, 2013

Foucault: Lectures at College de France, 1973–74 (Lecture 10–12)

In the beginning of his tenth lecture on psychiatry in the 19th century, Foucault suddenly feels the urge to "open a parenthesis here and insert a little history of truth in general" (p. 235). This turns out to be a theory that divides the conception of truth into two distinct historical categories, a procedural, scientific truth, and a mystical, revealed truth.

The Archeology of Knowledge as a Reduction

He explains:
So you have attested truth, the truth of demonstration, and you have the truth-event. We could call this discontinuous truth the truth-thunderbolt, as opposed to the truth-sky that is universally present behind the clouds. We have, then, two series in Western history of truth. The series of constant, constituted, demonstrated, discovered truth, and then a different series of the truth which does not belong to the order of what is, but the the order of what happens, a truth, theefore, which is not given in the form of discovery, but in the form of the event, a truth which is not found but arounsed and hunted down: production rather than apophantic. It is not a truth that is given through the mediation of instruments [in a wide sense, presumably], but a truth provoked by rituals, captured by ruses, seized according to occassion. This kind of truth does not call for method, but for strategy. (p. 237)
A paradigmatic example of the truth-event is the confession under torture during the inquisition (p. 240) as opposed to the weighing of evidence against evidence according to certain standards of proof.

He further explains that one of the points of his intellectual project is to
show how this truth-demonstration […] derives in reality from the truth-ritual, truth-event, truth-strategy, and how truth-knowledge is basically only a region and an aspect, albeit one that has become superabundant and assumed gigantic dimensions, but still and aspect or a modality of truth as event and of the technology of this truth-event. (p. 238)
In addition, "Showing that scientific demonstration is basically only a ritual […] is what I would call the archeology of knowledge" (p. 238). So there we have that.

The Hiddenness of Heidegger

In an almost hysterically elliptical remark, Foucault non-quotes Heidegger:
There are those who are in the habit of writing the history of truth in terms of the forgetting of Being, that is to say, when they assert forgetting as the basic category of the history of truth, these people place themselves straightaway within the privileges of established knowledge, that is to say, something like forgetting can only take place on the ground of the assumed knowledge relationship, laid down once and for all. Consequently, I think they only pursue the history of one of the two series I have tried to points out, the series of apophantic truth, of discovered, established, demonstrated truth, and they place themselves within that series. (p. 238)
That's probably the most direct criticism of Heidegger you'll find in the whole corpus of Foucault — but even here, it takes the shape of a somewhat indirect stab at what "some people say." It's almost like he is the One Whose Name We Must Not Speak.

Hysteria as a Counter-Strategy

Just a quick comment on the last two lectures in the series (11 and 12): In these lectures, Foucault is mainly concerned with presenting a theory about the sudden surge in cases of hysteria in the late 19th century.

His hypothesis is that patients, once they realized that doctors relied on them to produced stable somatic symptoms, started to derive pleasure from experiencing this power over the doctor. Of course the doctor was still the one with most of the power, but the neurological patient had an option which the incarcerated madman did not have: the option of choosing what the disease was "about" — for instance, by having a paralysis after a work accident.

Foucault illustrates the strategies of the doctor and the patient, respectively, with the following little script:
"Obey, keep quiet, and your body will speak." So, you want my body to speak! My body will speak, and I really promise you that there will be much more truth than you can imagine in the answers it will give you. Not, certainly, that my body knows more about it than you, but because there is something in your injunctions that you do not formulate but which I can clearly hear; a certain silent injunction to which my body will respond. (p. 305)
The point here is of course that the next logical step is that the patient can start having symptoms that will drag the doctor inadvertently into the sphere of sex. Thus, after the invention of the "neurological body" allowed the doctor to normalize symptoms like psychosomatic paralyses, the hysterics would shift towards an even more radical blurring of the line the medical and the existential spheres.

As he says on the very last page (p. 323), the doctors now had two options: Admit that hysteria wasn't a disease proper, or admit that sexuality also belonged to the medical sphere. In the long run, the second option won out, as we know. Thus "the hysterics, to their great pleasure, but doubtless to our greater misfortune, gave rise to a medicine of sexuality" (p. 323).

This does seem to shed some further light on the point of the whole course: I could be read as a genealogy of the notion of medical knowledge about sexuality. I don't know what to say exactly to Foucault's hypothesis about hysteria, but then again, it's not like we already have a perfect theory of what it is that went on in those consultations of Charcot back in the late 19th century.

Wednesday, May 1, 2013

The Steen-Gibbs-McGlone debate in Discourse Processes (2011)

In 2011, the journal Discourse Processes had a special issue on cognitive metaphor theory. The issue consists of four papers, a target piece by Raymond Gibbs, a reply by Matthew McGlone, a reply by Gerard Steen, and a rebuttal by Gibbs. The whole thing spans about 50 pages.

What is Cognitive Metaphor Theory?

Gibbs is quite direct in his claims. He directly openly says that:
Under the CMT view, so-called clichéd expressions, such as "stay the course" and "We're spinning out wheels," are not dead metaphors, but reflect active schemes of metaphorical thought. (p. 532)
However, the picture later muddles a bit because he constrains his claims to primary metaphors (pp. 9–10) and disqualifies explicit similes such as My job is a jail (p. 547). He also entertains the thought, inconsistent with conventional versions of cognitive metaphor theory, that these mappings may be learned through verbal behavior (p. 540).

But in fact, the story gets even muddier than that. In the last section before the conclusion, he floats his dynamical systems proposal for cognitive metaphor theory (p. 551ff). But while it all sounds very good with emergence and self-organization, Gibbs never commits to any particular model. This puts him in the questionable company of a tradition that throws around fancy math terms without actually saying very much.

That's not a nice accusation, but I'm afraid it's unavoidable. Gibbs himself seems to misunderstand fundamental concepts about dynamical systems: For instance, he seems to confuse the system states and state spaces (p. 551), confuse out-of-equilibrium states with chaos (see p. 115 of his 2005 book), and to confuse attractors with their associated basins of attraction (p. 551). Also, talking about "nonlinear interactions" (p. 554) without specifying what the independent variables are is just plain nonsense.

Constrained Comprehension

At any rate, the bottom line is that cognitive mappings aren't after all the basis of metaphor understanding; rather, they are factor that influence understanding, but neither necessary nor sufficient:
… many conceptual metaphors, along with many other constraining forces, may have partial, probabilistic influence on one's understanding of verbal metaphor. (p. 553)
That seems about right — but of course, we would have to specify what those forces were for this to be a informative statement. Gibbs seems to put everything in the mix, just to be on the safe side:
For instance, some dynamic processes occur over short time spans (e.g., neural firings or momentary thoughts). Others processes unfold over the course of individuals’ lives, and so guide development and change in personality, and interpersonal interactions throughout the lifespan. Dynamic processes also operate on populations over a much longer, evolutionary timeframe. (p. 551)
So just about anything can potentially be a relevant factor in determining comprehension. In his reply to this claim, Steen comments:
This basically brings all possibly relevant parameters of discourse processing together, from neural cognition to cognitive evolution, with all other dimensions of language use and discourse processing in between; and, in principle, allows for each of them to exert some yet to be determined effect on the ongoing discourse process. … If such a dynamic systems theoretical model could generate more precise predictions … it would be a valuable upgrade of CMT. (p. 587)
So it's not like the claim is wrong, it's just that it's only a ragtag collection of vague analogies rather than a theory.

The Case for Cognitive Metaphor Theory

OK, so getting a precise theory out of Gibbs can occasionally be a bit like holding on to wet soap. But he does bring a set of arguments to the table in order to support the claim that cross-modal effects play some sort of role in word comprehension. Here is, schematically, the arguments he cites:
  1. Metaphorical expressions are systematic (p. 532)
  2. Novel metaphors can be explained by means of old mappings (p. 532)
  3. The mappings explain certain facts about etymology (p. 532)
  4. Mappings are shared across cultures (p. 538)
  5. Scientific concepts can be analyzed as metaphors (p. 540)
  6. Verbal framing can influence decision-making (p. 540)
  7. There are mappings in non-verbal thought (pp. 540–41)
  8. Gestures sometimes seem to reflect mappings (p. 541)
  9. There are cross-modal spill-over effects from non-verbal to verbal tasks (p. 541–542)
  10. People have consistent mental images (p. 544)
  11. People have consistent judgment of underlying mappings (p. 544)
  12. (Quite novel) euphemisms are understood faster when primed metaphorically (p. 545)
  13. Transparent idioms are learned faster than no-so-transparent (p. 546)
  14. Domains are similar in an asymmetric way (p. 548)
So that's a lot of stuff, and not all of these arguments are equally good.

The Bad

Some of them are bad just for conceptual reasons: For instance, interviewing people about their mental imagery is just not a very reliable way of producing cognitive evidence. Ironically, Gibbs seems to accept this point when the data goes the other way, as in his comments on McGlone's paraphrase task:
Yet, asking people to verbally paraphrase a novel metaphor may not be the best indicator of the possible underlying presence of conceptual metaphor in interpreting these novel expressions. Given the long-noted difficulties people have in paraphrasing metaphors (Gibbs, 1994), the fact that 41% could provide interpretations that seem to meet some criteria for conceptual metaphor may be a positive finding in favor of CMT. (p. 546–47)
In fact, the quote goes on to question McGlone's materials on the grounds that metaphors of the form A is B or the form A is the B of C are "Not typically motived by single conceptual metaphors" (p. 547), so that that do not count as negative evidence anyway (even though they do, apparently, count as positive).

Other arguments are weak for empirical reasons: For instance, the systematicity which originally motivated cognitive metaphor theory has in fact been hugely overstated, and much of the last 15 years of research has been dedicated to the problems that arise from the fact that there are unsystematic holes in metaphorical mappings. (This parallels the fact that there is no cognitive reason that you can't ask for the time with the question How late is it?) Similar problems hold for the gesture evidence.

The Good

But some pieces of evidence are good. Here's a list of some of the experimental studies that Gibbs cites, and which I think have a lot of merit:
  • Meier and Robinson 2004: Positive words are recognized faster when they're presented at the top of a computer screen.
  • Giessner and Schubert 2007: Subjects judge bosses to have more power when they are depicted higher up on a screen.
  • Williams and Bargh 2008a: Subjects find other people more friendly when they're holding a cup of warm coffee.
  • Williams and Bargh 2008b: Subjects find themselves more socially isolated when they have been asked to measure large distances.
  • Zhong and Lilgenquist 2006: Subjects who think about immoral behavior are more likely to accept an antiseptic wipe afterwards.
  • Meier, Robinson, and Clore 2004: Subjects are slower at categorizing words like faith or beggar as positive or negative when the positive are dark or the negative bright.
  • Wilson and Gibbs 2007: Subjects are faster at reading a metaphorical grasp sentence if they have first made a grasping hand movement.
I am not saying all of these studies are methodologically unproblematic (in fact, I am quite skeptical about using the Giessner and Schubert study as evidence of how online processing works). I am just saying that Gibbs is right in saying that they bring some information to the table which should not be ignored.