Thursday, March 27, 2014

Walters: An Introduction to Ergodic Theory (1982), p. 26

Book cover; from Booktopia.
Section 1.4 of Peter Walters' textbook on ergodic theory contains a proof of Poincaré's recurrence theorem. I found it a little difficult to read, so I'll try to paraphrase the proof here using a vocabulary that might be a bit more intuitive.

The Possible is the Necessarily Possible

The theorem states the following: If
  1. X is a probability space
  2. T: XX is a measure-preserving transformation of X,
  3. E is an event with positive probability,
  4. and x a point in E,
then the series
x, Tx, T2x, T3x, T4x, …
will almost certainly pass through E infinitely often. Or: if it happens once, it will happen again.

The idea behind the proof is to describe the set R of points that visit E infinitely often as the superior limit of a series of sets. This description can then be used to show that ER has the same measure as E. This will imply that almost all points in E revisit E infinitely often.

Statement and proof of the theorem; scan from page 26 in Walters' book.

I'll try to spell this proof out in more detail below. My proof is much, much longer than Walters', but hopefully this means that it's much, much more readable.

Late Visitors

Let Ri be the set of points in X that visit E after i or more applications of T. We can then make two observations about the series R0, R1, R2, R3, …:

First, if j > i, and you visit E at a time later than j, you also visit E at a time later than i. The Ri's are consequently nested inside each other:
R0R1R2R3 ⊃ …
Let's use the name R for the limit of this series (that is, the intersection of the sets). R then consists of all the points in X that visit E infinitely often.

The series of sets is downward converging.

Second, Ri contains the points that visit E at time i or later, and the transformation T–1 takes us one step back in time. The set T–1Ri consequently contains the points in X that visit E at time i + 1 or later. Thus
T–1Ri = Ri + 1.
But since we have assumed that T is measure-preserving, this implies that
m(Ri) = m(T–1Ri) = m(Ri + 1).
By induction, every set in the series thus has the same measure:
m(R0) = m(R1) = m(R2) = m(R3) = …
Or to put is differently, the discarded parts R0\R1, R1\R2, R2\R3, etc., are all null sets.

Intersection by Intersection

So we have that
  1. in set-theoretic terms, the Ri's converge to a limit R from above;
  2. but all the Ri's have the same measure.
Let's use these facts to show that m(ER) = m(E), that is, that we only throw away a null set by intersecting E with R.

The event E and the set R of points that visit the E infinitely often.

To prove this, notice first that every point in E visits E after zero applications of T. Thus, R0E, or in other words, ER0 = E. Consequently,
m(ER0) = m(E).
We now need to extend this base case by a limit argument to show that
m(ER) = m(E).
But, as we have seen above, the difference between R0 and R1 is a null set. Hence, the difference between ER0 and ER1 is also a null set, so
m(ER0) = m(ER1).
This argument holds for any i and i + 1. By induction, we thus get
m(ER0) = m(ER1) = m(ER2) = m(ER3) = …

The probability of a visit to E before but never after time i has probability 0.

Since measures respect limits, this implies that
m(ER0) = m(ER).
But we have already seen that m(ER0) = m(E), this implies that
m(ER) = m(E).

The Wherefore

An informal explanation of what's going on in this proof might be the following:

We are interested in the conditional probability of visiting E infinitely often given that we have visited it once, that is, Pr(R | E). In order to compute this probability, we divide up RE into an infinite number of cases and discover that all but one of them have probability 0.

If you Imagine yourself walking along a sample path, your route will fall in one of the following categories:
  • you never visit E;
  • you visit E for the last time at time i = 0;
  • you visit E for the last time at time i = 1;
  • you visit E for the last time at time i = 2;
  • you visit E for the last time at time i = 3;
  • there is no last time — i.e., you visit E infinitely often.
When we condition on E, the first of these cases has probability 0.

In general, the fact that T is measure-preserving guarantees that it is impossible for an event to occur i times without occurring i + 1 times; consequently, all of the following cases also have probability 0.

We thus have to conclude that the last option — infinitely many visits to E — has the same probability as visiting E once, and thus a conditional probability of 1.

Wednesday, March 26, 2014

von Mises: Probability, Statistics and Truth (1951), ch. 1

Richard von Mises; from Wikipedia.
Richard von Mises was an important proponent of the frequentist philosophy of probability.

In his book Probability, Statistics and Truth, he militates against the use of the word "probability" for anything other than indefinitely repeatable experiments with converging relative frequencies (pp. 10–12). He also compares probabilities to physical constants like the velocity of a molecule (p. 21) and asserts that the law of large numbers is an empirical generalization comparable to physical laws like the conservation of energy (pp. 16, 22, and 26).

Reference Class Relativism

A consequence of this frequentist notion of probability is that specific events do not have probabilities. Only infinite classes of comparable events can have probabilities.

For instance, when your coin comes up heads at 10 o'clock, that's a different event from the coin coming up heads at 11 o'clock in infinitely many ways. Only because you choose which properties from the situation to select can you identify the two events as equivalent.

As a kind of argument for this reference class relativism, von Mises asserts that a specific person has a different probability of dying depending on the reference class (e.g., people over 40, men over 40, male smokers over 40, etc). We thus have to explicitly select the reference class before we can talk about "the" probability.

He comments:
One might suggest that a correct value of the probability of death for Mr. X may be obtained by restricting the collective to which he belongs as far as possible, by taking into consideration more and more of his individual characteristics. There is, however, no end to this process, and if we go further and further into the selection of the members of the collective, we shall be left finally with this individual alone. (p. 18)
Even from a frequentist perspective, I'm not sure this makes sense. The fact that we have narrowed down our reference class so much that there is only a single real person left in it should not change the fact that we still have an intensional definition of the class. In so far we do, we should be able to apply that definition to the outcome of any sequence of candidates, like an infinite sequence of people or experiments. In reality, it is only data sparsity that keeps use from going "further and further."

So I think von Mises has a theoretical choice to make: Either, he must require that reference classes be actually infinite, or he must merely require that they be potentially infinite.

"Randomness" and Insensitivity to Subsequence Selection

Von Mises spends a large part of the lecture elaborating a notion of "randomness" which is intended to capture the difference between asymptotically i.d.d. sequences and not asymptotically i.d.d. sequences with the same limiting frequencies. He does so by adding the requirement that the limiting frequencies are independent of subsequence selection.

A possibly more intuitive way of stating that definition would be in terms of a Topsøe-style game: A structure-finding player is tasked to pick infinitely many places in a sequence based on past data and is rewarded when the empirical frequencies fails to converge to a given distribution; a structure-hiding player is tasked to select the sequence and is rewarded when the frequencies do converge to the given distribution.

If the structure-hider then introduces any systematic dependence between the experiments, the structure-finder can exploit these regularities to outgamble the structure-hider. Thus, only asymptotically i.d.d. sequences are part of an equilibrium.

I haven't checked the details, but this game seems to be the same as that suggested by Shafer and Vovk, although (if I remember correctly), they only consider fair (that is, maximum-entropy) i.i.d. coins, not arbitrary biases. But at any rate, coin flipping is, like distributions on a finite set, one of the cases in which there is a maximum entropy distribution even in the absence of an externally given mean.

Tuesday, March 25, 2014

Frankfurt: "Indavertence and Responsibility" (2008)

In his Amherst lecture, Harry Frankfurt defends the unspectacular assertion that we are only morally responsible for things we do on purpose. He does so by distinguishing causal and moral credit:
We are responsible for [things we do inadvertently] as their cause, even though we do not intend them. They accrue to our credit or to our blame, though not to our moral credit or moral blame. (p. 14)
Much of his discussion is spurred by a set of silly thought examples by Thomas Nagel – a person pulling a trigger without intending to fire the gun, etc.

Frankfurt giving the lecture; from amherstlecture.org.

In the course of making his point, Frankfurt presents a rather disconcerting thought example of this own:
Let us suppose, then, that a person is the carrier of a highly contagious and dreadful disease. […] I suppose that the person would naturally be horrified, would feel helplessly discouraged by the evident impossibility of keeping from doing wholesale harm, and might well conclude – even while acknowledging no moral responsibility at all for being so toxic – that the world would be better off without him. The toxicity is by no means his fault; but he certainly cannot pretend that it has nothing to do with him. However he may wish that this were not the case, he is a poisonous creature, who cannot avoid doing dreadful harm. (p. 13; emphases in the original)
Something seems really out of key here.

Sunday, March 23, 2014

Derrida: "Signature Event Context" (1972)


In this essay, Derrida makes a number of strongly Wittgensteinian points about meaning and the use of language.

He notes that a sign always relies on a history of use in so far as it has a meaning, but that this history does not in and of itself constitute an unambiguous precedence. Theories with a strong or even mentalistic concept of “literal meaning” — Derrida duscusses Husserl and Austin — thus always have to jump through a lot of hoops in order to make it seem as if it were obvious how this word ought to be used in all future cases.

The Winding Road to Theory


He arrives at this conclusion by a somewhat strange route over a discussion of the concept of “writing.” According to Derrida, there is a classical philosophical theory which singles out writing as being different from speech in that it essentially involves a lot of peculiar absences — the physical absence of the writer, the somewhat vaguely defined role for the reader, and even the possible absence of a communicative intent in the writing itself.

However, he goes on, these features are in fact present in all communication, so we should really count speaking as a kind of “writing” as well if we take this definition literally.

We don't, of course, but the point is well taken: When a sign means something, it is because it echoes something else in the past.

There is therefore an essential tension between the observation that signs have meaning because they conform to a tradition of use, and the assumption that signs express inner, conscious, authentic intentions. Declaring a meeting open or declaring somebody husband and wife always is kind of theatrical, and the attempt in Husserl and Austin to ban all theatrical or “non-serious” uses of language from playing a role in the theory is therefore doomed from the outset.

Why This Post is So (Damn) Long


Book cover; from Wikipedia.
Derrida is such a bad writer that it can be really, really difficult to even parse his sentences, let alone to find out what his point is. Because it is so much work to plow through his layers of nested interjections and weird reverse sentence structures, tt always annoys me when people summarize him in broad strokes without commenting on specifics.

So I'll try something different here: I'll go through the text, literally page by page, trying to paraphrase everything he says in readable, English prose. If anybody finds this blasphemous, then I refer to that French philosopher who says that nobody owns the meaning of a text.

I'm following the page numbers as they appear in Limited Inc. Scans of the text is available from several university websites (e.g., here, here, and here).

I've not followed Derrida's headings, but rather divided up the text into some smaller chunks. This is partly to give the argument some structure and partly to give you some breathing space.

The Problem with Communication, Context, and Writing


The Problem of Communication, pp. 1–2


Derrida warms up with some reflections that are quite weakly related to the rest of the essay: We might, he says, be tempted to say that the invention of writing extended spoken communication into a new medium. But this presupposes a concept of "communication," and we cannot necessarily take this for granted.

The word "communication" can refer to the effects of physical forces as well as the effects of meaning. This might suggest that we can think of the concept of linguistic communication as a metaphorical extension of a literal concept of physical communication.

However, Derrida disapproves of this suggestion on the grounds that
  1. he finds the whole idea of "literal meaning" suspect;
  2. he considers it circular to base a theory of meaning on a theory of meaning.

The Problem of Context, pp. 2–3


Derrida thinks that there are indeed some problems with the seemingly unproblematic notion of "communication," and he locates these problems more precisely in the concept of "context."

Politeness is notorious for depending on context in subtle ways. Here, etiquette icon Emily Post
sidesteps the issue by giving a cut-and-dry prescription without any qualification. (From Etiquette, Ch. 28)

He asks:
But are the conditions of a context ever absolutely determinable? … Is there a rigorous and scientific concept of context? (pp. 2–3)
Lest you should think that the answer is yes, he asks an even more leading rhetorical question:
Or does the notion of context not conceal, behind a certain confusion, philosophical presuppositions of a very determinate nature? … I shall try to demonstrate why a context is never absolutely determinable, or rather, why its determination can never be entirely certain or saturated. (p. 3)
According to Derrida, this demonstration will
  1. raise suspicions about the concept of context;
  2. change the way we understand the concept of “writing”; specifically, he will question the idea that writing is a kind of transmission of information.

If Speech is Like Writing, We have a Problem, pp. 3–4


As stated above, Derrida notes that a lumping together speech and writing presupposes a unifying concept of communication:
To say that writing extends the field and powers of lucutory or gestural communication presupposes, does it not, a sort of homogenous space of communication? (p. 3)
If indeed there is such a homogenous space, then speech and writing should share most features. However, he will first argue that the "classical" theory of writing will claim that they are substantially different, and then that they in fact are quite similar after all.

The "Classical" Theory of Writing


Condillac on Writing, pp. 4–5



In order to sketch what the tradition has to say about writing, Derrida provides a couple of quotes by the French Enlightenment philosopher Étienne Condillac (1714–1780), specifically from his Essay on the Origin of Human Knowledge (1746).

Condillac; from zeno.org.
He then quotes Condillac as saying that
Men in a state of communicating their thoughts by means of sounds, felt the necessity of imagining new signs capable of perpetuating those thoughts and of making them known to persons who are absent. (p. 4)
This passage comes from Part II, Section I, Chapter 13, §127 of Condillac's Essay. In the 2001 translation that I have linked to above, the word "absent" does not occur, but it does in fact in the French original.

Derrida thinks of this hypothetical origin of language as an explanation in terms of "economy," that is, practical concerns.

As one might suspect, Condillac thinks that writing is more civilized when it looks like Eurpoean writing: Thus Greek and Latin letters are the best, Egyptian hieroglyphs intermediate, and pictures are at the bottom.

Comments on Condillac, p. 5–7


Derrida again emphasizes the role of "absence" in Condillac's discussion. He stresses that
  1. it is characteristic of writing that it continues to cause effects even after the departure of the writer;
  2. to Condillac, absence is a gradual thinning out of presence (as in picture > symbol > letter).
Derrida also emphasizes the central role of analogy in Condillac's theory (words are analogous to thoughts, etc.)

He also repeats that Condillac is just one example of this theory, and that others could be given.


The Grand Claim, Part 1


Speech Might Be A Kind of Writing, p. 7


Derrida then proposes two "hypotheses":
  1. All communication presupposes a kind of absence; so if writing is special, it must be because it presupposes an absence of a special kind.
  2. Suppose we find out what this special kind of absence is, and suppose that it turns out to be shared by all other kinds of communication too; then there must be something wrong in our definitions of communication, or writing, or both.
This is a somewhat curious rhetorical somersault: First, Derrida has to sell us the rather unconventional idea that there is a classical theory of writing which defines writing in terms of a special kind of "absence." Then he has to shoot down that theory again.

Writing is Iterable, pp. 7–8


A swastika mosaic excavated from
a late ancient church in contemporary Israel.
If there ever was an "overdetermined" sign,
this symbol must surely be an example.
(Image from Wikipadia.)

So what kind of "absence" is characteristic of writing? Derrida proposes that the key is that writing is intelligible in the absence of an author. This means that it can be cited or read indefinitely, or in his phrase, "iterated."
The possibility of repeating and thus of identifying the marks is implicit in every code, making it into a [grid] that is communicable, transmittable, decipherable, iterable for a third, and hence for every possible user in general. (p. 8)
Here is another way of saying it: If a sign really means something, then other people can use it for their own purposes in other contexts. If they can't, it doesn't really have a meaning:
To write is to produce a mark that will constitute a sort of machine which is productive in turn, and which my future disappearance will not, in principle, hinder in its functioning … (p. 8)

Alleged Consequences of the Iterability Claim, pp. 8–9:


This iterability theory of writing has, according to Derrida, four consequences:
  1. It detaches writing from mentalistic notions like consciousness, intended meaning, etc. The theory is inconcsistent with the notion of "communication as communication of consciousnesses" or as a "semantic transport of the desire to mean" (p. 8).
  2. It provokes "the disengagement of all writing from the semantic or hermeneutic horizons which … are riven by writing" (p. 9). What he means is perhaps that iterability is different from "meaning" in some limited, conventional sense.
  3. It detached writing from "polysemics" (p. 9). Like the previous point, this could mean that the open-endedness of future use and citations is different from ambiguity of the more familiar kind, but I really don't know.
  4. The concept of context becomes very problematic.
He says he will come back to all of these points later, but I don't know what he's referring to.

The Grand Claim, Part 2


The Characteristics of “Writing,” p. 9


At this point, Derrida wants to "demonstrate" that the iterability property is found in other kinds of communication in addition to writing, and, more generally, across "what philosophy would call experience" (p. 9).

Continuing his explanation of what he thinks Condillac is saying, he singles out three properties that writing is supposed to have according to the "classical" theory:
  1. Writing subsists beyond the moment of production and "can give rise to an iteration in the absence … of the empirically determined subject who … emitted or produced it." (p. 9)
  2. Writing "breaks with its context," where context means the moment of production, including the intention of the writer:
    But the sign possesses the characteristic of being readable even if the moment of its production is irrevocably lost and even if I do not know what its alleged author-scriptor consciously intended to say at the moement he wrote it, i.e. abondened it to its essential drift. (p. 9)
    So once you write a sentence down, you lose control.
  3. These breaks are related to the fact that writing is placed at some distance from the "other elements of the internal contextual chain" (p. 9). Presumably this chain is supposed to consist of things like the writer, the time of writing, the intention, etc. Derrida calls this the "spacing" of writing.
As is probably apparent, this list is really just a repetition of things that he has already said earlier.

The Lincoln memorial, finished 1922, mimics Roman architecture
mimicking Greek architecture; picture from Wikipedia.

All Communication is Writing, p. 10


After having made these remarks about the alleged classical theory, Derrida goes on to ask whether the classical characteristics of writing really are characteristics of all communication:
Are [these characteristics] not to be found in all language, in spoken language for instance, and ultimately in the totality of "experience" … ? (p. 10).
As an example of iterability in spoken language, he notes that we need to be able to recognize a word across "variations of tone, voice, etc.". This means that every new application of the word has to be recognized as an echo or citation of some earlier event. Thus, meaning must involve citation, since
… this unity of the signifying form only constitutes itself by virtue of its iterability, by the possibility of its being repeated in … the absence of a determinate signified or of the intention of actual signification, as well as of all intention of present communication. (p. 10).
These iterability conditions are, says Derrida, really characteristic of writing according to the classical theory. Hence, spoken language is a kind of "writing":
This structural possibility of being weaned from the referent or from the signified (hence from communication and from its context) seems to me to make every mark, including those which are oral, a grapheme … (p. 10).
Again, he generalizes this to experience without going to much into the topic:
And I shall even extend this law to all "experience" is general if it is conceded that there is no experience consisting of pure presence but only of chains of differential marks. (p. 10)
The idea is, presumably, that in so far as experience is mediated or interpreted, it is a kind of writing.


Critiquing the Tradition, Part 1


Husserl on Nonsense, pp. 10–11


Husserl; from the Lancet.
Husserl has a theory of how the sign can be detached from its referent. He proposes, according to Derrida, the following taxonomy:
  1. Signs that have a clear meaning, but no current referent (I say "The sky is blue" while you can't see the sky);
  2. Signs that fail to have a meaning because they are
    1. superficial syntactic symbol manipulation, as in formalistic mathematics;
    2. oxymorons, like "a round square";
    3. word salad, like "a round or," "the green is either," or "abracadabra."
This discussion refers to Volume II of Husserl's Logical Investigations. Specifically, the relevant parts of the text are Investigation I, §15 and Investigation IV, §12.

Derrida on Husserl, p. 12


Derrida notes:
But as "the green is either" or "abracadabra" do not constitute their context by themselves, nothing prevents them from functioning in another context as signifying marks. (p. 12)
As an example, he mentions that the word string "the green is either" is used by Husserl as an explicit example of agrammaticality — so it did after all have a use in language. (Consider also how the sentences "Colorless green ideas sleep furiously" or "All your base are belong to us" have taken on a life of their own and can now be echoed or referenced.)

This illustrates, he says,
the possibility of disengagement and citational graft which belongs to the structure of every mark, spoken or written … (p. 12).
Even more explicitly:
Every sign … can be cited, put between quotation marks; in doing so it can break with every given context, engendering an infinity of new contexts in a manner which is absolutely illimitable. (p. 12)
He goes on to say that a sign which did not have this property of citationality or iterability would not be a sign.

Critiquing the Tradition, Part 2


Things Derrida Likes About Performatives, p. 13


After having discussed Husserl, Derrida moves on to Austin. He wants in particular to talk about the notion of performative speech acts.
This concept, he says, should interest us for the following reasons:
  1. Every proper utterance is in a sense performative.
  2. The concept of performatives is a "relatively new."
  3. Performatives do have referents in the usual sense.
  4. The discussion of performatives made Austin reanalyze meaning as a concept of force (and this brings him, says Derrida, closer to Nietzsche).
These four features of performatives undermine the traditional communicative concept of meaning, according to Derrida.

Austin's Blind Angle p. 14


In spite of this subversive potential of performatives, Austin fails to realize that spoken language has the same "citationality" as writing, and this causes problems for his analysis again and again.

Specifically, he holds on to his mentalistic understanding of meaning. The "total context" that Austin has to keep referring in his discussion always contains
consciousness, the conscious presence of the intention of the speaking subject in the totality of his speech act. As a result, performative communication becomes once more the communication of an intentional meaning … (p. 14)

Infelicity is Structurally Necessary, p. 15


Austin; from University of Washington.
So Derrida claims that citationality is a precondition of meaning. Hence, a theory which tries to exclude the ritualistic or theatrical aspect of word use will either have to push aside a lot of counterexamples or run into problems.

On one hand, Austin can thus recognize that 
… the possibility of the negative (in this case, infelicities) is in fact a structural possibility, that failure is an essential risk of the operations under consideration; (p. 15)
but on the other hand, he
… excludes that risk as accidental, exterior, one which teaches us nothing about the linguistic phenomenon being considered. (p. 15)
Repeating that point once more, Derrida states that:
  1. Austin recognizes that there are ritualistic aspects to the context of a conventional performative speech act, but not that there are ritualistic aspects to meaning itself. "Ritual," Derrida asserts, is "a structural characteristic of every mark." (p. 15)
  2. Austin does not take the possibility of infelicity seriously enough, and he consequently fails to recognize that it is "in some sense a necessary possibility." (p. 15).

Critiquing the Tradition, Part 3



Serious and Non-Serious Language, pp. 16–17


To illustrate these points further, Derrida quotes a passage from Austin's work in which he says that theatrical or joky language is “parasitic” on the more serious uses of language.

But such theatrical language use is not peripheral, Derrida claims:
For, ultimately, isn't it true that what Austin excludes as anomaly, exception, "non-serious," citation, (on stage, in a poem, or a soliloquy) is the determined modification of a general citationality—or rather, a general iterability—without which there would not even be a "successful" performative? (p. 17)
Do you want me to answer? Or is this a questions-only conversation?

A Private Language Argument, p. 17


At this point one might interject, Derrida says, that “literal” performatives are successfully executed all the time (opening a meeting etc.), so shouldn't he take care of those cases before he starts talking about theatrical deviations?

Not necessarily, Derrida says: Even a private language will have to conform to some internal standard, and even an event that happens only once might implicitly be a version of something else.

The Necessity of Infelicity Again, p. 18–19



In effect, Austin thus depicts "ordinary language" as surrounded by a ditch which it can fall into if thing go awry. But according to Derrida, this is a somewhat misleading picture in that the "ditch" is a necessary shadow of meaning.

A possibly infelicitous speech act; by Don Hertzfeldt.
He asks:
Could a performative utterance succeed if its formulation did not repeat a "coded" or iterable utterance, or in other words, if the formula I pronounce in order to open a meeting, launch a ship or a marriage were not identifiable as conforming with an iterable model, if it were not then identifiable in some way as a "citation"? (p. 18)
(Correct answer: No, it couldn't.)

As a consequence:
The "non-serious," the oratio obliqua will no longer be able to be excluded, as Austin wished, from "ordinary" language. And if one maintains that ordinary language, or the ordinary circumstances of language, excludes a general citationality or iterability, does that not mean that the "ordinariness" in question … shelter[s] … the teleological lure of consciousness … ? (p. 18)
(Correct answer: Yes, it does.)

Thus, the concept of "context" itself gets into some problems too, since it is not clear what counts as a theatrical context, and what doesn't:
The concept of … the context thus seems to suffer at this point from the same theoretical and "interested" uncertainty as the concept of the "ordinary," from the same metaphysical origins: the ethical and teleological discourse of consciousness. (p. 18)
To round off, he ensures us that his point isn't that consciousness, context, etc. makes no difference to meaning, but only that their negative counterparts cannot be excluded from the picture.

Who Really Talks When You Are Talking? pp. 19–20


Derrida's signature, jokingly inserted at the end of the paper.
In the last section, Derrida asks who the "source" is of a highly ritualistic sentence like "I hereby declare the meeting open." Austin himself compares such sentences with signatures, so Derrida picks up that thread.

Signatures are funny, he says, because a signature is expected at once to be authentic, and unique to the specific situation, but at the same time, also have a "repeatable, iterable, imitable form."

Being thus authentic if and only if they are good copies, signatures thus illustrate the contradiction that is built into the mentalistic notion of writing.

A Last Salute


Perspectives and Additional Claims, p. 20–21


On the last page of the essay, Derrida very rapidly throws a couple of rather large claims at the reader, mixed loosely with a summary of his main points:
  1. The concept of writing is gaining ground, so that philosophy increasingly relies on authenticity concepts like "speech, consciousness, meaning, presence, truth, etc."
  2. Writing is difficult to understand from the perspective of the traditional theory.
  3. His project of insisting on the work done by negative concepts (absence, failure, etc.) can be carried further in a larger project of metaphysical criticism.
So that's a dubious claim, a triviality, and a literature reference.

Thursday, March 20, 2014

Derrida and Eagleton on Spectres of Marx

Cover image; from Amazon.
In 1993, Derrida published Spectres of Marx. Six years later, a companion piece came out, containing nine more or less critical essays by a number of authors, and with a final response by Derrida himself.

The most disturbing and notable interaction in this book is Terry Eagleton's criticism of Derrida and Derrida's furious response.

Eagleton accuses Derrida of being politically banal underneath the flamboyant prose. Derrida in turn accuses Eagleton of having learned nothing, not done his homework, and failed to properly read his book. The whole discussion pretty much melts down into vitriol and anger, with the reader being left as the big loser.

Eagleton on Derrida

Eagleton quotes Derrida as claiming that deconstruction secretly was a super-radical form of Marxism all along:
For Specters of Marx doesn't just want to catch up with Marxism; it wants to outleft it by claiming that deconstruction was all along a radicalized version of the creed. 'Deconstruction,' Derrida remarks, 'has never had any sense or interest, in my view at least, except as a radicalization, which is to say also in the tradition of a certain Marxism, in a certain spirit of Marxism'. (p. 84)
But this, Eagleton asserts, is just "a handy piece of historical revisionism" without any support in the work of Derrida or his followers:
Whatever Derrida may now like to think, deconstruction – he must surely know it – has in truth operated as nothing in the least like a radicalized Marxism, but rather as an ersatz form of textual politics […] which seemed to offer the twin benefits of at once outflanking Marxism in its audacious avant-gardism, and generating a sceptical sensibility which pulled the rug out from under anything as drearily undeconstructed as solidarity, organization or calculated political action. (p. 84)
Thus, only by oscillating back and forth between extravagant philosophy and pedestrian social democratic politics such works can seem both excitingly radical and politically attractive. The consequence is a reluctance to touch anything that might crystallize as a mass movement:
And what does Derrida counterpose […]? A 'New International', one 'without status, without title, and without name … without party, without country, without national community …' And, of course, as one gathers elsewhere in the book, without organization, without ontology, without method, without apparatus. […] Spectres of Marxism indeed. (p. 87)
Without any of these commitments, such a political project remains hollow:
What he wants, in effect, is a Marxism without Marxism […]. 'We would be tempted to distinguish this spirit of Marxist critique … at once from Marxism as ontology, philosophical or metephysical system, as "dialectical materialism", from Marxism as historical materialism or method, and from Marxism incorporated in the apparatuses of party, State, or workers' International.' It would not be difficult to translate this into the tones of a (suitably charicatured) liberal Anglicanism: we must distinguish the spirit of Christianity from such metaphysical baggage as the existence of God, the divinity of Christ, organized religion, the doctrine of the resurrection, the superstition of the Eucharist and the rest. (p. 86)
Such a thoroughly academic form of politics would indeed by quite ridiculous. The question is whether it is fair to characterize Derrida's work this way, or whether it has political potentials that have gone under the Eagleton's radar.

Derrida on Eagleton

So we might hope that Derrida had responded directly to Eagleton's critique. Unfortunately, however, his response is too vitriolic to really get into any of the substance of the argument.

Derrida opens his essay by humbly noting that it will be "inadequate" (p. 213), and that
It would be presumptuous of me, arriving after everyone else, in a position at once panoramic and central, to claim the right to the last word […] (p. 214)
But after this little rite, he is done with the formalities. He can then move on to scold "the patented Marxists still prepared to dispense lessons from on high" (p. 221), including, as you might have guessed, Terry Eagleton:
Terry Eagleton is, fortunately, the only (and nearly the last) 'Marxist' of this stripe. He is the only one […] to maintain that imperturbably triumphal tone. One can only rub one's eyes in disbelief and wonder where he finds the inspiration, the haughtiness, the right. Has he learned nothing at all? (p. 221–22)
Yes, where in the world could he have found the inspiration for such an arrogant tone? In a footnote, Derrida also submits that Eagelton is guilty of
the facile, demagogic, grave error of confusing my work (or even 'deconstruction' in general) with postmodernism [and this error] is indicative […] of a massive failure to read or analyze. This rudimentary misunderstanding might by itself warrant my breaking off all further dialogue until certain 'homework' was done. (pp. 263–64)
So it appears, after all, that some people do have property rights over texts after all.

Another footnote almost threatens to boil over with malice and spite:
Eagleton is undoubtedly convinced that, with the finesse, grace and elegance he is universally acknowledged to possess, he has hit upon a title ('Marxism without Marxism') which is a flash of wit, an ironic dart, a witheringly sarcastic critique […] (p. 265)
But the dart missed the mark, Derrida contends:
Every 'good Marxist' knows, however, that nothing is closer to Marx, more faithful to Marx, more, 'Marx', than a 'Marxism without Marxism'. Need we recall here that this Marxism without Marxism was, to begin with, the Marxism of Marx himself, if that name still means anything? (p. 265)
Having thus spent his limited space defending his abstract right to have an opinion, Derrida never gets into the subject of whether that opinion translates into a political project. I would have liked to hear him elaborate on that, though.

Tuesday, March 18, 2014

Lewis: "Humean Supervenience Debugged" (1994)

In this paper, David Lewis wrings his hands at a phenomenon he calls "undermining." He considers a probabilistic model as "undermined" if it assigns a positive probability to a data set that would cause a rational agent to adopt a different model.

"Contradiction!"

To say this in a vocabulary closer to Lewis', suppose that C(F | E) is the posterior probability a rational believer would assign to the event F in light of the evidence E. Suppose further that we are looking at the specific case in which E reveals the actual parameters of the world (e.g., "This coin has bias 0.3"), and F is a possible future which would produce a different subjective belief in our hypothetical observer (e.g., "The empirical frequency will be 0.4")

The question is then: Given E, does the possible future F have zero probability, or positive probability? Without giving any argument, Lewis asserts that
there is some present chance that events would would go in such a way as to complete a chancemaking pattern that would make the present chances different from what they actually are. (p. 482)
But he contrasts that with the following "argument":
But F is inconsistent with E, so C(F/E) = 0. Contradiction. (p. 483)
The former of these quotes seem to indicate that he is thinking about a finite sample from the model (consistent with the example he gives on p. 488). The latter argument, on the other hand, seems to assume that he is talking about a limiting frequency from an ergodic process or something like that — unless he seriously believes that empirical frequencies cannot differ from parameter values.

The Super-Objectivist

But this way of putting the argument is of course alien to Lewis. He has no concept of a statistical model, and he thinks that the credence of a rational agent is a unique and well-defined concept that doesn't require any assumptions:
Despite appearances and the odd metaphor, this is not epistemology! You're welcome to spot an analogy, but I insist that I am not talking about how evidence determines what's reasonable to believe about laws and chances. Rather, I'm talking about how nature—the Humean arrangement of qualities—determines what's true about laws and chances. Whether there are any believers living in the lawful and chancy world has nothing to do with it. (pp. 481–82)
This is even stronger and more absurd than classical objectivism. Instead of just discarding certain models as inconsistent with the evidence, Lewis assumes that the evidence suggests a single optimal model out of its own accord. For no apparent reason, he also wants "nature" to do this in a retrospective manner even though there is no reason to, given that he has expelled all subjective observers from the universe.

The Big Flip

Lewis' own solution to the "paradox" is to say that credences should be conditioned on "theories" as well as data — but "theory" doesn't quite mean what it sounds like. This is evident from the example he gives towards the end of the paper.

In this example, he assumes that a coin has exhibited a frequency of 2/3 heads in the past, and he assumes that this means that our hypothetical rational agent estimates its bias to be 2/3.

The "theory" T that he wants us to consider is then that the next 10,002 coin flips exhibit a frequency of exactly 2/3 heads, i.e., 6,668 heads and 3,334 tails. This event has the binomial probability
Pr(T) = B(6,668; 10,002, 2/3).
He then asks us to consider a possible future A in which the next four coin flips come up heads. Still using the parameter estimate of 2/3, this has the binomial probability
Pr(A) = B(4; 4, 2/3).
What is the conditional probability Pr(A | T)? Since the "theory" T did not change the parameter estimate 2/3, one might think that it equals the unconditional probability Pr(A). But for no apparent reason, Lewis decides to take the four coin flips in A from the coin flips in T, producing an amputated event T' with 3 fewer heads and 1 fewer tails. Even more oddly, he computes Pr(A, T') as if the two events were independent even though the observation of either would clearly change the parameter estimate used to compute the conditional probability of the other.

So according to his logic, the "joint probability" of A and T' is then
Pr(A, T') = B(4; 4, 2/3) B(6,665; 9,998, 2/3).
By dividing this by Pr(T), he supposedly finds the "conditional probability" of A given T.

This computation is, of course, completely absurd. If the parameter had been 1/3 instead of 2/3, it would have produced a "probability" larger than 1. So I'm afraid the example isn't doing much good.

Saturday, March 1, 2014

Attneave: Applications of Information Theory to Psychology (1959)

Fred Attneave's book on information theory and psychology is a sober and careful overview of the various ways in which information theory had been applied to psychology (by people like George Miller) by 1959.

Attneave explicitly tries to stay clear of the information theory craze which followed the publication of Shannon's 1948 paper:
Applications of Information Theory to Psychology (cover)
Book cover; from Amazon.
Thus presented with a shiny new tool kit and a somewhat esoteric new vocabulary to go with it, more than a  few psychologists reacted with an excess of enthusiasm. During the early fifties some of the attempts to apply informational techniques to psychological problems were successful and illuminating, some were pointless, and some were downright bizarre. At present two generalizations may be stated with considerable confidence:
(1) Information theory is not going to provide a ready-made solution to all psychological problems; (2) Employed with intelligence, flexibility, and critical insight, information theory can have great value both in the formulation or certain psychological problems and in the analysis of certain psychological data (pp. v–vi)
Or in other words: Information theory can provide the descriptive statistics, but there is no hiding from the fact that you and you alone are responsible for your model.

Language Only, Please

Chapter 2 of the book is about entropy rates, and about the entropy of English in particular. Attneave talks about various estimation methods, and he discusses Shannon's guessing game and a couple of related studies.

As he sums up the various mathematical estimation tricks, he notes that predictions from statistical tables tend to be more reliable than predictions from human subjects with respect to the first couple of letters of a text. This means that estimates from human predictions will tend to overestimate the unpredictability of the first few letters of a string.

He then comments:
What we are concerned with above is the obvious possibility that calculated values (or rather, brackets) of HN [= the entropy of letter N given letter 1 through N – 1] will be too high because of the subject's incomplete appreciation of statistical regularities which are objectively present. On the other hand, there is the less obvious possibility that a subject's guesses may, in a certain sense, be too good. Shannon's intent is presumably to study statistical restraints which pertain to language. But a subject given a long sequence of letters which he has probably never encountered before, in that exact pattern, may be expected to base his prediction of the next letter not only upon language statistics, but also upon his general knowledge [p. 40] of the world to which language refers. A possible reply to this criticism is that all but the lowest orders of sequential dependency in language are in any case attributable to natural connections among the referents of words, and that it is entirely legitimate for a human predictor to take advantage of of such natural connections to estimate transitional probabilities of language, even when no empirical frequencies corresponding to the probabilities exist. It is nevertheless important to realize that a human predictor  is conceivably superior to a hypothetical "ideal predictor" who knows none of the connections between words and their referents, but who (with unlimited computational facilities) has analyzed all the English ever written and discovered all the statistical regularities residing therein. (pp. 39–40; emphases in original)
I'm not sure that was "Shannon's intent." Attneave seems to rely crucially on an objective interpretation of probability as well as an a priori belief in language as an autonomous object.

Just like Laplace's philosophical commitments became obvious when he starting talking in hypothetical terms, it is also the "ideal predictor" in this quote which reveals the philosophy of language that informs Attneave's perspective.