Thursday, May 31, 2012

Janssen: "Independent Choices and the Interpretation of IF Logic" (2002)

In this paper, Theo Janssen argues that Jaako Hintikka and Gabriel Sandu's notion of independence-friendly logic does not adequately formalize the notion of quantifier independence, at least according to his intuitions about what independence should mean. He has essentially two arguments, of which the first one is the strongest.

Dependence Chains

The most serious problem that Janssen points out is that IF logic may require that a choice A is independent from a choice C without ruling out that there is some intermediate choice B such that A depends on B, and B depends on C.

Such cases create some quite strange examples, e.g.:
  • TRUE: x: (x 2) v (u/x: x = u).
  • FALSE: x: (x = 2) v (∃u/x: x = u).
  • TRUE: x: (x ≠ 2) v (∃u/x: xu).
  • TRUE: x: (x = 2) v (∃u/x: x u).
The true sentence are here true in spite of the independence between u and x, the reason being that the disjunction is not independent of x. The Verifier can thus circumvent the quantifier independence. For instance, in the first sentence, he can set u := 2 and then pick the left disjunct if and only if x = 2.

Similar examples exists where the middle term which smuggles in the dependence is not a disjunction, but another quantifier.

Naming Problems

Another problem occurs, according to Janssen, when "a variable is bound within the scope of a quantifier that binds the same variable" (p. 375). This occurs for instance in sentence like
  • xx: R(x,x).
He claims that such sentences come about by "classically allowed" substitutions from, in this case,
  • xy: R(x,y).
After such a substitution, an indirect dependencies referring to the value of y might be lost, and an otherwise winning Verifier strategy might be broken. However, I don't know whether there would be any problem with just banning double-bound quantifiers as the x above; it doesn't seem to have any necessary or positive effect.

Solutions

To avoid the problems of Hintikka's system, Janssen defines a new game with explicit extra conditions such as "The strategy does not have variables in W as arguments" and "If the values of variables in W are changed, and there is a winning choice, then the same choice is a step towards winning" (p. 382).

This solves the problem, but doesn't bring about much transparency, it seems to me. A better solution would probably be to describe the instantiation of the quantifiers and the selection of branches at the connectives as a probability distribution on a suitable power of the domain and of the branch options {L,R}. Then independence could be clearly described as statistical independence.

Such a system would require the domains to be finite, which is not good. However, within finite domains, results about logical strength of solution concepts would be easy to extract, because they would simply correspond to different constraints on the dependencies between the choices, i.e., marginal distributions. It would, in fact, allow us to quantify the amount of information that was transmitted from one choice to another by computing the mutual information between two marginal distributions.

Tuesday, May 29, 2012

Aloni and van Rooij: "Free Choice Items and Alternatives" (2007)

This paper argues for a purely pragmatic treatment of some facts about the free-choice items any, irgend-, and qualsiasi (English, German, and Italian, respectively). The manuscript on Maria Aloni's website is dated 2005, but the paper appears to have been officially published for the first time in 2007.

The positive part of the paper is built around a logical implementation of Gricean reasoning. As far as I can see, it is equivalent to assuming that speakers only utter sentences φ that (1) they know, and (2) that maximally informative among the alternatives Alt(φ). This can be formalized in standard epistemic logic. A stronger assumption is later introduced in order to handle some more cases (p. 14).

These assumptions constrain the set of knowledge states that the speaker may be in, and this gives rise to implicatures. For instance, if both p and q are alternatives to the sentence p v q, then the speaker is blocked from uttering the disjunction is she in fact knows that one of the alternatives is the case.

The paper refers centrally to Gerald Gazdar's book Pragmatics from 1979.

Staudacher: Use Theories of Meaning (2010)

By and large, Marc Staudacher endorses the use of evolutionary perspectives on signaling game in his dissertation.

However, in section 7.2.3, he reiterates his concern about a fact that he observed on his blog two years ago: Although there are models of signaling games with infinitely many signals, no model plausibly explains how syntactically structured signals can be aligned with semantically compositional meanings.

"But it seems to be more of a technical problem that will eventually be solved," he adds (pp. 214-15). That seems to be a sound enough intuition. The semantics of a language with finitely many atoms and finitely many relations is learnable in finite time, even if the syntactic span of the language is infinite.

A temporal difference approach could for instance do the trick. Given a signal f(x,y) = b, where b is 0 or 1, an agent could record all of the sentences f(0,0) = b, f(0,1) = b, f(1,0) = b, and f(1,1) = b with fractional observation counts determined by the subjective probability of x = 0, x = 1, y = 0, and y = 1.

Wednesday, May 16, 2012

Sevenster: "A strategic Perspective on IF Games" (2009)

This is a commentary on Hintikka and Sandu's game-theoretical approach to independence-friendly logic. Sevenster considers how the notion of truth is changed when one changes the information flow or the solution concept for the falsification game over a sentence.

Information Access

When playing an extensive game, one can be "forgetful" to different degrees. In the general setting, forgetfulness has various effects, such as blocking the possibility of threat behavior. In terms of the semantics of quantifiers, the various degrees of forgetfulness also allow for different types of independence:

Memory capacity Solution concept Independence relations
Gobal strategy and past moves Nash equilibrium None
Global strategy Other Existentials of universals
NeitherSubgame perfect equilibirum Anything of anything

To see the difference between the two degrees of independence, consider the following sentence:
  • There is an x and there is a y such that x = y.
Assume that we are in a world with two objects, and that the two existential quantifiers in the sentence are independent of each other. Then the verifier can at most achieve a 50% chance of verifying the sentence, since there is no information flow from the first choice to the other.

If, on the other hand, the second choice is dependent on the first, the verifier can achieve a 100% success rate. The difference is that between a sequential and a simultaneous coordination game.

This example is exactly the one that I have felt missing in Hintikka's discussions, so its nice to see that I'm not alone. Apparently, Theo Janssen has discussed the problem in a paper from 2002 (cf. Sevenster's article, p. 106).

Solution Concepts

Sevenster uses three different solution concepts in his article:
  1. Nash equilibrium
  2. WDS + P
  3. WDS
WDS strategy profiles are profiles in which all players play a weakly dominant strategy, i.e., one that is (weakly) optimal whatever everyone else does. This is a very strong condition.

WDS + P strategy profiles are, as far as I can see from Sevenster's Definition 12, the ones that remain after the removal of the weakly dominated strategies for player n, then for player n – 1, and so on. This is weaker than WDS, since a WDS + P strategy for player i does not need to be (weakly) optimal with respect to every other strategy, but only optimal with respect to the WDS + P strategies for players j with j > i.

Neither of these last two solution concepts are standard. The middle one is slightly problematic because it may give different results when players are enumerated differently. But that's a drawback it shares with all solution concepts based on elimination of weakly dominant strategies.

Some Mixed Equilibria

Just the for the sake of it, let me just briefly review two example sentences:
  • There is an x and there is a y such that x = y.
  • There is an x such that for all y, x = y.
Assume further that we are in a model in which there are exactly two objects, a and b. These two sentences then correspond to a simple coordination game and to Matching Pennies, respectively. The coordination game has the three equilibria (0, 0), (1/2, 1/2), and (1, 1), white Matching Pennies only has the equilibrium (1/2, 1/2).

The interpretation of this in classical logic is that the double existential sentence is supported by two pieces of evidence (a = a and b = b), while the sentence with the universal quantifier is supported by no evidence.

However, in the mixed strategy equilibrium, both strategies pay the same for both players. This means that the players achieve no gain by knowing that the other player is going to play (1/2, 1/2). Accordingly, they correspond to the case without any (useful) information flow between the players.

The existence of such equilibria thus witness the existence of a true, independence-friendly reading of the sentence in the relevant model. Note, however, that this does not mean that the sentences are tautological—the Verifier does not have any strategy that guarantees the payoff 1 in any of the two cases.

Tuesday, May 15, 2012

Chemla and Spector: "Experimental Evidence for Embedded Scalar Implicatures" (2010)

I remember Benjamin Spector seeing speaking about this experiment at ESSLLI 2010. The paper argues that the sentence
  • Every student solved some problems 
has a so-called "localist" reading. A counterexample to such a localist reading is a student who solved no questions, or a student who solved all questions. The second option is the crucial one, since a strict Gricean model does not predict any implicatures that warrant this reading. Chemla and Spector claim that the localist reading is available, although not the dominant one.

Experimental Set-Up and Subject Responses

The empirical method that they employ in order to prove this involves some sentences and some drawings. The drawings show six letters, each surrounded by six circles. Each letter is connected to some number of the circles surrounding it, as in the following figure:

 

The relevant sentences were then of the following kind:
  • Every letter is connected with some of its circles
In particular, the question was how subjects evaluate such sentences when no letters are completely disconnected, and at least one letter is connected every circle. Will anyone judge the sentence to be false in that scenario?

However, instead of just asking their subjects this question (a methodology that has previously falsified the theory) they asked subjects to click somewhere on a bar connecting the word "Yes" and the word "No." With this methodology, subjects did indeed pick points closer to "No" when the drawing included some letters that were connected to all of their circles.

So it seems that when pushed, subjects do start to doubt whether the word "some" should be taken to mean "at least one" or "at least one and not all." Once they notice this ambiguity, they might become slightly more scared of giving an unequivocal "Yes" and consequently click somewhere lower on the scale.

Or to put it differently, as soon as the subjects start fearing that they are in some kind of polemic language game, they switch to a safer strategy by committing to less. So a localist reading is indeed available, and in some circumstances a possible claim inherent in the sentence.

Relevance Considerations

However, what really caught my eye on this second rereading of the paper was the comments that Chemla and Spector make about relevance while discussing possible experimental set-ups:
the local reading (‘every square is connected with some of the circles and not with all of them’) is relevant typically in a context in which we are interested in knowing, for each square, whether it is connected with some, all, or no circle. Such a context would for instance result from raising the following question: ‘Which squares are connected to which circles?’. (Section 2.2.2, page 365)
The globalist reading, one might add, would be the most relevant answer to the question Which sqaures are connected to some circles? The only counterexamples to the globalist reading consist of sqaures that are not connected to anything, as stated above.

The reason I find this comment interesting is that it connects the meaning of the sentence with the expectation of the subjects. Experimental materials place the subject in some particular role, and this implicitly suggests certain answers to the big question: What does he expect me to do with this question?

Chemla and Spector obviously embrace some kind of objectivist perspective on semantics, with sentences having grammatically determined meanings, and a clear division between grammar and pragmatics. But their sensitivity to the perspective of the subject is very commendable and opens up the possibility of founding the notion of meaning of the notion of social context.

Hintikka: "What is elementary logic?" (1995)

The claim of paper is that independence-friendly logic is more "natural" that ordinary first-order logic. That is, the restriction to quantifiers with nested scopes is unnecessary and unfounded.
In this article, as in everything he has written, there are some serious linguistic issues with the examples he uses, and it is by no means clear that his own semantic intuitions are generalizable.

The paper is reprinted in an 1998 anthology of Hintikka's work, but Hintikka referred to the paper as "forthcoming" in 1991, and it was published for the first time in 1995.

The Old Example: Villagers and Townsmen

His old natural-language argument for the usefulness of independence-friendly logic comes from his introspective intuitions about the following sentence:
  • Some relative of each villager hates some relative of each townsman
This sentence has two readings, a classical and an independence-friendly. These readings can be distinguished by the following model: Suppose that there is one villager and one townsman, and that they are related to themselves and to each other; suppose further that they hate each other, but do not hate themselves.

The Verifier then instantiates some relative of each (= the only) villager by picking either the villager or the townsman, since everyone is related. The same goes for some relative of each (= the only) villager. The sentence is true exactly when the two choices are different, and not true when they are the same (since no one, per assumption, hate themselves).

When the two choices are independent, the Verifier has no winning strategy, and the sentence is thus not true. The Falsifier, on the other hand, also doesn't have a winning strategy, since some combinations of Verifier choices do in fact make the sentence true, and others don't. In the independent-friendly reading, the sentence is thus neither true nor false. In the classical reading, it's true.

Now Hintikka's claim is that the independence-friendly reading of this English sentence is a plausible reading (or the most plausible?). He does not give any empirical arguments for the claim.

Note that the same logical structure can be replicated with a slightly less far-fetched example:
  • A north-going driver and a south-going driver can choose to drive in a side of the road so that they avoid a collision.
If you think that this sentence is true, you have read it in the classical way. If you think it is false, you have read it in the independence-friendly way.

The New Example: The Boy that Loved the Girl that Loved Him

In support of his claim, he provides the following "perfectly understandable English sentence" as evidence (p. 10):
  • The boy who was fooling her kissed the girl who loved him.
He then claims that this sentence cannot be expressed in first-order logic "no matter how you analyze the definite descriptions" (p. 10).

So how can we analyze the definite descriptions? I guess we have at least the following options:
  • The boy1 who was fooling her2 kissed the girl2 who loved him1.
  • The boy1 who was fooling her2 kissed the girl3 who loved him1.
  • The boy1 who was fooling her2 kissed the girl2 who loved him4.
  • The boy1 who was fooling her2 kissed the girl3 who loved him4.
I suppose that the problematic case that he is thinking about is the first one. That's the one where the sentence implicitly states that the identity of the girl uniquely identifies a beloved boy, while the identity of the boy also uniquely identifies a fooled girl.

This is obviously a circular dependence, but can still meaningfully apply (or not apply) to various cases. For instance, if x fools y and y loves x, then it applies. If x fools y, and y loves z, or loves both x and y, then it doesn't.

But unlike the villagers-sentence, I can't see how this is not expressible in terms of first-order logic, given the usual legitimate moves in sentence formalization. But perhaps Hintikka has some strange and far-fetched "natrual" reading of the sentence in mind?

Friday, May 11, 2012

Marc Staudacher: Use Theories of Meaning (2010)

Martin recommended this recent PhD thesis as an up-to-date survey of contemporary philosophies of language. I had a printed version standing around in my office, but it's also available via the ILLC repository of dissertations.

Conventions and Social Norms

The subtitle of the dissertation is "between conventions and social norms," and this is also the central theme of the text: Is meaning a social norm, or is it just a regularity in behavior? In other words, do we conform to the dictionary because we are morally obliged to, or for purely practical reasons?

In chapter 2 of the dissertation, Staudacher reiterates a number of arguments for each of these options. He evaluates two of them as particularly strong, so I will briefly run through those.

Contra: Section 2.1.2

The strong argument against the obligatory and normative nature of meaning comes from a paper by Akeel Bilgrami (from an anthology with discussions of Donald Davidson's philosophy of language).

Bilgrami's claim is that philosophers' urge to give meaning a normative character comes from the fact that they want meaning to depend on something more than mere behavior. In particular, they want to know "what concept" a person's use of a word is supposed to reflect. he argues that this is empirically unnecessary and thus a piece of unnecessary metaphysical fat.

His main example can roughly be restated like this: Imagine an otherwise competent English-speaker that one day says "I have such a horrible headache in my shoulder." The concept-hungry philosopher would then try to uncover some (new) underlying rule or concept behind this (new) application of the word; but Bilgrami emphasizes that we actually don't need such a rule to describe or explain the speaker's behavior. Meaning is thus not essentially normative.

Pro: Section 2.2.3

The argument in favor of a normative concept of meaning is "the argument from mistakes."

This is the observation that we feel inclined to correct people's incorrect use of a word, even if we understand perfectly well what they mean. Think for instance about confusions of the effective/efficient distinction.

It is of course an open question whether such a correction should be seen as benevolent, practical advice or as an expression of moral standards. Perhaps they can be compared to the pragmatic ambiguity of utterances like "You are not allowed to smoke in here."

A Dissociation Device: Section 2.5

In his discussion of the two perspectives on meaning, Staudacher plays around with the idea of a society of completely pragmatic speakers. These mostly use words in their usual meaning, but for reasons that are purely practical rather than ethical. This is a quite useful way of searching out the empirical differences between the two hypotheses.

The most interesting difference between this norm-free universe and the normative one is that hearers will not have any option of condemning speakers for giving false information, and speakers cannot blame hearers for interpreting words in a wrong way. Hearers will thus treat speakers as imperfectly reliable sources of information, like reasonably good thermometers, and speakers will treat hearers as imperfectly reliable reaction machines.

In the terminology of Brown and Levinson, this means that all positive face demands fall out of the game of talking and interpreting. As long as the negative face interests of the two players coincide perfectly (say, they have to communicate in order to row a boat in sync) this will not differ from the normative case. But when one of them has no negative face interests in the situation, or even opposing face interests, then the two hypotheses will be empirically different.

Can Regularities Be Non-Normative?

One thing that I have been speculating about a lot while reading Staudacher's chapter 2 is whether group regularities automatically produces norm enforcement. For instance, if all members of a community drink white wine, will they then necessarily develop a hostile behavior towards people who drink red wine?

The claim of both structuralism and poststructuralism is that they will, since social groups have a tendency to assign meaning to any parameters that distinguish them from others, thus introducing a taboo against borderline cases. The question is (1) whether this is true, and (2) whether this should be explained in terms of individual self-protection or group protection.

This is a quite intricate question and touches on some deeper problems with separating a norm from a regularity. But consider some of these cases of non-conform behavior:
  • One child a school class is smarter than the others; the rest of the class bullies that child.
  • You have had dinner at a restaurant with two friends. They have just ordered a dessert of ice cream with chocolate sauce, and you then order a dessert of low-fat sorbet with a piece of fresh fruit.
  • You have just had dinner with your two friends, and they have both ordered vanilla ice cream. You then order chocolate ice cream.
  • You take a walk on the street, naked.
The question is what the source of the social pressure in these situation is, if there is any. In particular, it is interesting to what degree normalization is good for the group or for any individual.

Right now, my thinking is that social behavior should be understood as stemming from four different sources:
  1. Individual impulse (do whatever you feel like)
  2. Deliberation (do what is expedient and serves your own interests)
  3. Conformity (do whatever everyone else does)
  4. Deliberation for the group (do what serves the common good)
These four dimensions correspond roughly to the pleasure principle and the reality principle of the negative and the positive face of a person, respectively. Note than any two of the dimension can be in conflict with each other. In particular, giving absolute priority to one dogma will yield four characteristic types of behavior:
  1. Whimsical child (Tourette's syndrome)
  2. Clever egotist (psychopathy)
  3. Anxious teenager (sheep behavior)
  4. Paternalistic saint (otherwordly goodness)
With respect to language use, they might be instantiated as follows:
  1. Say what ever you comes into your mind; use Humpty Dumpty meanings
  2. Use words in ways that maximize the desired effects; lie if it helps you.
  3. Use words in standard ways; say standard things; avoid conflict
  4. Speak truthfully; be relevant; be precise.
I don't know if this model of behavior will be adequate, but it does have the advantage of mapping quite easily onto both some theories from cognitive psychology (frontal control vs. no frontal control) and some theories in sociology (self-interest vs. internalized norms). It would also generalize the conflicting aims of cooperative conversation in a way that would allow us to introduce non-cooperative aspects into pragmatics.

Friday, May 4, 2012

Topsøe: "Game Theoretical optimization inspired by information theory" (2009)

I finally got around to reading Flemming's recent paper on information theory. It's a short introduction to his and Peter Harremoës' approach to the topic, which involves a game-theoretical interpretation of coding.

The Nature-Observer Game

The basic theoretical framework is the following: Nature and Observer play a zero-sum game. Nature picks a world from the set X, and Observer picks an action from the set Y. For each strategy profile (x,y), three quantities are then defined:
  • H(x), the entropy of x
  • D(x,y), the (Kullback-Leibler) divergence of x from y
  • Φ(x,y), the complexity of the pair (x,y)
The three quantities are related through the equality
Complexity = Entropy + Divergence
which also implies that the smallest complexity that Observer can achieve in a world x is the entropy H(x).

The point of the game is for nature to produce maximal complexity, and for Observer to produce minimal complexity. By Von Neumann and Morgenstern's minimax theorem, this means that a strategy profile (x*,y*) is an equilibrium for the game if the following three quantities coincide (p. 559):
supx infy Φ(x,y) = Φ(x*,y*) = infy supx Φ(x,y)
The leftmost term here designates the optimal outcome for Nature (highest complexity given averse reponses), while the rightmost term designates the optimal outcome for Observer (lowest complexity given averse responses).

Notice that infy Φ(x,y) = H(x), since D(x,y) is assumed to be positive. This means that Nature in effect can be seen as an entropy-maximizer when playing optimally. Further, Topsøe defines R(y) = supx Φ(x,y) to be the risk associated with a given action y, so Observer can then be described as a risk-minimizer.

Information Transmission

The paper also defines a notion of information transmission rate (p. 556), but I am not quite sure about its applications. But this is the idea behind the concept, in bullet-point form:
  • Assume that α is a (prior) probability distribution on the set of worlds.
  • Construct an expected case scenario by summing all worlds with the probabilities of α as weights.
  • Let y be a best response to this expected case.
  • For each world x, let the surprisal of x be the divergence between x and y. This can be seen as a motivated quantitative measure of how different x is from the expected case.
  • Find the weighted average of the surprisals, using the probabilities of α as weights.
This average surprise level is the information transmission rate.

Notice that if there is a single action which is an optimal response to all worlds, then the information transmission rate is 0. This reflects the fact that the state of the world would not inform the actions of Observer at all in such a situation. He would simply use the same strategy whatever happened.

Reversely, if the information transmission rate is very high, this means that an insensitive, on-average response pattern will be costly for the Observer (in terms of complexity). If we reinterpret the action of choosing a best response to a certain world as an act of inferring the state of the world (describing it correctly), then the surprisal can further be seen as Observer's ability to distinguish states of the world. On such an interpretation, the best-response function can be seen as a signaling function in a Bayesian game.