Notebooks on Language: quantifiers

Showing posts with label quantifiers. Show all posts

Wednesday, November 7, 2012

Vendler: "Each and Every, Any and All" (1962)

This paper was first published in Mind in 1962, by just like everybody else, I read the version reprinted in the Linguistics in Philosophy (1967). It discusses the meaning of the words in the title and is famous for having described the meaning of any in terms of a certain "freedom of choice" (p. 80).

Each, Every, and All

Vendler describes the differences between each, every, and all in terms of collective reference vs. individual reference. His theory is that all is collective, while each and every are distributive.

We thus have differences like

You can buy each of these items for $5 (distributive)
You can buy all of these items for $5 (collective)

Every, on the other hand, can be seen as a quantification over all the distributive attributions so that "every is between each and all" in meaning (p. 77). We thus get — according to my intuitions — slightly more ambiguous examples with every:

You can buy every one of these items for $5

According to my intuition, this could lean towards both a collective ($5 in total) and a distributive reading ($5 per item).

The Blank Check

Vendler describes his ideas about any nicely in this quote:

To say

Any doctor will tell you …
is to issue a blank warranty for conditional predictions: you fill in the names. You choose Dr. Jones; well, then he will tell you if you ask him. You pick twenty-five others; then, I say, they will tell you if you consult them. (p. 85)

This means that

… the any-proposition is an unrestricted warranty for conditional statements or forecasts and, we may add, for contrary-to-fact conditionals. In other words, to draw an obvious conclusion, it is an open hypothetical, a lawlike assertion. (p. 89)

I like the phrase "open hypothetical." It both highlights why any can be used in couterfactuals and other modals, and why it does not have existential import.

Vendler also notes that every single time any is used, it issues this blank warranty anew:

… I can certainly not say

*He took any one
even if you acted on my words: Take any one. […] Any calls for a choice, but after it has been made any loses its point. (p. 81)

In other words, once all the facts are settled, you cannot use any to make a report, since "facts are not free" (p. 84).

Any and The Pragmatics of Preferences

One more quote from his explanation:

With Take any one, it is up to you to do the determining; here it does not make sense to ask back, Which one? Thus while in the former case [Take one] I merely fail to determine, in the latter case [Take any one] I call upon you to determine, in other words, I grant you unrestricted liberty of individual choice. (p. 79–80)

He notes that this also explains why a command like You must take any seems odd. Interestingly, though, the British National Corpus does contain examples like the following:

You must report any losses immediately.

It is probably fair to paraphrase this sentence as

If you have any losses, you must report them immediately.

So it appears that you can in fact order people to take any apple, but only if they are placed in an environment in which they are exposed to apples, and they be tempted to not take all of them (so to speak).

A Probabilistic Interpretation of Any

The last thing Vendler does in the article is to informally sketch a way that the difference between any, every, and all could be implemented in a compositional probabilistic semantics:

A bag contains a hundred marbles. We inspect ten at random and all ten are red. Then the probability that any one marble we care to pick out of the hundred will be red is quite high. Yet the probability of every one's being red is much lower. (p. 94)

I interpret this the following way: When you evaluate the formula

All the marbles are red.

you are really asking for the posterior probability that the relevant parameter is 1. When you ask

Some of the marbles are red.

you are asking for the posterior probability that the parameter is larger than 0. However, when you evaluate the formula

Any marble we draw will be red.

you are looking for the posterior probability that one randomly drawn marble will be red, given your evidence. This amounts to summing up the probabilities of the statements

The bag contains 100 red marbles, and if I draw one at random, it will be red.
The bag contains 99 red marbles, and if I draw one at random, it will be red.
The bag contains 98 red marbles, and if I draw one at random, it will be red.
…

To take an example with slightly lower numbers, suppose I have drawn a marble twice from a bag of 10 marbles, and that in both cases, I drew a red marble. Then the posterior probability of the different parameter settings are shown in the graph below:

With these numbers, we get the probabilities

P("All marbles are red") = P(p = 1 | k = 2, n = 2) = 26%
P("Some marble is red") = 1 – P(p = 0 | k = 2, n = 2) = 1 – 0% = 100%
P("Any marble is red") = Σ_i P(p = p_i | k = 2) * P(k = 1 | p = p_i, n = 1) = 79%

The sum in the last line then ranges over all the parameters values p = 0, 0.1, 0.2, 0.3, …, 0.9, 1. As stated by Vendler, the probability of the any-sentence is substantially higher than the probability of the all-sentence.

Thursday, October 11, 2012

Bob van Tiel: "Embedded scalars and typicality" (2012)

Bob van Tiel has, as far as I understand, been arguing for a while that the various empirical problems surrounding scalar implicatures can be explained in terms of typicality. So the strangeness of saying that I ate some of the apples if I in fact ate all of them should be compared to the strangeness of saying there's a bird in the garden if there is in fact an ostrich in my garden.

This argument is nicely and succinctly presented in a manuscript archived at the online repository The Semantics Archive. It contains a fair amount of nice empirical data.

A Bibliography

First of all, the paper contains pointers to most of the interesting recent literature on the subject. Let me just liberally snip out a handful of good references that I either have read or should read:

Chemla, E. and B. Spector (2011). Experimental evidence for embedded implicatures. Journal of Semantics 28(3), 359–400.
Clifton, C. and C. Dube (2010). Embedded implicatures observed: a comment on Geurts and Pouscoulous (2009). Semantics & Pragmatics 3(7), 1–13.
Degen, J. and M. K. Tanenhaus (2011). Making inferences: the case of scalar implicature processing. In L. Carlson, C. Hölscher, and T. Shipley (Eds.), Proceedings of the 33rd annual conference of the Cognitive Science Society, pp. 3299–3304. Austin, TX: Cognitive Science Society.
Geurts, B. and N. Pouscoulous (2009). Embedded implicatures?!? Semantics & Pragmatics 2(4), 1–34.
Horn, L. R. (1972). On the semantic properties of logical operators in English. Ph. D. thesis, UCLA. Distributed by Indiana University Linguistics Club.
Horn, L. R. (2006). The border wars: a neo-Gricean perspective. In K. von Heusinger and K. Turner (Eds.), Where semantics meets pragmatics, pp. 21–48. Berlin: Mouton de Gruyter.

This list should probably also include the following, which I still have to read:

Katzir, Roni (2007). Structurally-defined alternatives. Linguistics and Philosophy 30(6), 669–690.

Quantification According to van Teil

In sections 6 and 8 of the paper, van Teil suggests a very particular semantics for the use of some and any, both extracted from "goodness" ratings by 30 American subjects regarding the sentences All the circles are black and Some of the circles are black.

Semantics for All

His suggestion for the semantics of all is, loosely speaking, that the truth value V("all x are F") should be computed as the harmonic mean of the truth values of V("x₁ is F"), V("x₂ is F"), etc.

This obviously only makes sense for finite sets, but more strangely, it does not make sense if the truth value 0 occurs anywhere (since the harmonic mean involves a division). Consequently, he has to assume that V("x is black") = .1 when then x is white, and = .9 when x is black.

While this is not completely unreasonable, it does introduce yet another degree of freedom in his statistical fit (remember, he already chose the aggregation function himself), and it be a cause for some caution when interpreting his significance levels.

Semantics for Some

With respect to some, his suggestion is that the paradigmatic case of some circles are black is half of the circles are black. He thus sets the truth value V("some x are F") to be 1 minus the squared difference between the actual case and the half-of-the-individuals case. Ideally, this should give rise to truth value computation of the form

T(k) = 1 – (n/2 – k)².

However, on the graph on page 17 of the paper, we can see that T(5) < 7 (7 being the maximal "goodness" level), so even when exactly half of the circles are black, we do not get maximal truth. This must be due to some additional assumption like the .9 parameter introduced above, but as far as I can see, he doesn't explain this anywhere in the paper.

One assumption he does make explicit is that

this definition is supplemented with penalties for the situations where the target sentence is unequivocally false (i.e., the 0 and 1 situations) (p. 18)

While these seems relatively innocuous as a general move, we should note that the situation in which exactly one circle is black counts as a counterexample to Some of the circles are black. It also seems to postulate to different mechanisms for evaluating a sentence: First comparing it to a prototype example, and then in addition checking whether it is "really" true. This extra postulation makes his typicality model lose a lot of its attraction, since it discreetly smuggles conventional truth-conditional semantics back into the system rather than superseding it.

Van Tiel's Comments on Chemla and Spector

While the rest of the paper is reasonably clear, there is one part that I do not understand. This is the part where van Tiel recreates the results from Chemla and Spector's letter-and-circle judgment task.

Here's what I do get: He says that the sentence used by Chemla and Spector,

Every letter is connected to some of its circles

suggests most strongly a some-but-not-all reading (labeled "Mixed"), less strongly an all reading, and least strongly a none reading. So however a subject rates the seven different pictures given by Chemla and Spector (0 to 6 connections), they should respect this constrain on appropriateness orderings.

But then van Tiel says the following:

Using Excel, I randomly generated 5,000 values for each of the three cases such that every triplet obeyed the constraint [that some suggests Mixed more than All, and All more than None]. For every triplet, I calculated the typicality value for the seven situations. Ultimately, I derived the mean from these values for comparison with the results of Chemla & Spector. The product-moment correlation between the mean typicality values from the Monte Carlo simulation and the mean suitability values found by Chemla & Spector was nearly perfect (r = 0.99, p < .001). This demonstrates that Chemla & Spector’s results can almost entirely be explained as typicality effects. (p. 19)

I don't get what it is that he is simulating here. Since he randomly generates triplets (not 7-tuples), the stochastic part must be the proposed "goodness" intuition of a random subject. But how does he go from those three numbers to assigning ratings to all seven cases? I suppose you could compute backwards from the three values to the parameter settings for the model discussed above, but that doesn't seem to be what he's doing. So what is he doing?

I think it would have made more sense to compute the theoretically expected truth value of Chemla and Spector's sentence directly now that he has just gone through such pains to construct a compositional semantics for some and every.

We have the number of connections for each picture, so we can compute the truth value of, say, The letter A is connected to some of its circles; and we also have, in each condition, the set of pictures, so we could compute the harmonic mean of these values for the six truth values that are presented to the subject. Why not do that instead if we really want to test the model?

Wednesday, May 16, 2012

Sevenster: "A strategic Perspective on IF Games" (2009)

This is a commentary on Hintikka and Sandu's game-theoretical approach to independence-friendly logic. Sevenster considers how the notion of truth is changed when one changes the information flow or the solution concept for the falsification game over a sentence.

Information Access

When playing an extensive game, one can be "forgetful" to different degrees. In the general setting, forgetfulness has various effects, such as blocking the possibility of threat behavior. In terms of the semantics of quantifiers, the various degrees of forgetfulness also allow for different types of independence:

Memory capacity	Solution concept	Independence relations
Gobal strategy and past moves	Nash equilibrium	None
Global strategy	Other	Existentials of universals
Neither	Subgame perfect equilibirum	Anything of anything

To see the difference between the two degrees of independence, consider the following sentence:

There is an x and there is a y such that x = y.

Assume that we are in a world with two objects, and that the two existential quantifiers in the sentence are independent of each other. Then the verifier can at most achieve a 50% chance of verifying the sentence, since there is no information flow from the first choice to the other.

If, on the other hand, the second choice is dependent on the first, the verifier can achieve a 100% success rate. The difference is that between a sequential and a simultaneous coordination game.

This example is exactly the one that I have felt missing in Hintikka's discussions, so its nice to see that I'm not alone. Apparently, Theo Janssen has discussed the problem in a paper from 2002 (cf. Sevenster's article, p. 106).

Solution Concepts

Sevenster uses three different solution concepts in his article:

Nash equilibrium
WDS + P
WDS

WDS strategy profiles are profiles in which all players play a weakly dominant strategy, i.e., one that is (weakly) optimal whatever everyone else does. This is a very strong condition.

WDS + P strategy profiles are, as far as I can see from Sevenster's Definition 12, the ones that remain after the removal of the weakly dominated strategies for player n, then for player n – 1, and so on. This is weaker than WDS, since a WDS + P strategy for player i does not need to be (weakly) optimal with respect to every other strategy, but only optimal with respect to the WDS + P strategies for players j with j > i.

Neither of these last two solution concepts are standard. The middle one is slightly problematic because it may give different results when players are enumerated differently. But that's a drawback it shares with all solution concepts based on elimination of weakly dominant strategies.

Some Mixed Equilibria

Just the for the sake of it, let me just briefly review two example sentences:

There is an x and there is a y such that x = y.
There is an x such that for all y, x = y.

Assume further that we are in a model in which there are exactly two objects, a and b. These two sentences then correspond to a simple coordination game and to Matching Pennies, respectively. The coordination game has the three equilibria (0, 0), (1/2, 1/2), and (1, 1), white Matching Pennies only has the equilibrium (1/2, 1/2).

The interpretation of this in classical logic is that the double existential sentence is supported by two pieces of evidence (a = a and b = b), while the sentence with the universal quantifier is supported by no evidence.

However, in the mixed strategy equilibrium, both strategies pay the same for both players. This means that the players achieve no gain by knowing that the other player is going to play (1/2, 1/2). Accordingly, they correspond to the case without any (useful) information flow between the players.

The existence of such equilibria thus witness the existence of a true, independence-friendly reading of the sentence in the relevant model. Note, however, that this does not mean that the sentences are tautological—the Verifier does not have any strategy that guarantees the payoff 1 in any of the two cases.

Tuesday, May 15, 2012

Chemla and Spector: "Experimental Evidence for Embedded Scalar Implicatures" (2010)

I remember Benjamin Spector seeing speaking about this experiment at ESSLLI 2010. The paper argues that the sentence

Every student solved some problems

has a so-called "localist" reading. A counterexample to such a localist reading is a student who solved no questions, or a student who solved all questions. The second option is the crucial one, since a strict Gricean model does not predict any implicatures that warrant this reading. Chemla and Spector claim that the localist reading is available, although not the dominant one.

Experimental Set-Up and Subject Responses

The empirical method that they employ in order to prove this involves some sentences and some drawings. The drawings show six letters, each surrounded by six circles. Each letter is connected to some number of the circles surrounding it, as in the following figure:

The relevant sentences were then of the following kind:

Every letter is connected with some of its circles

In particular, the question was how subjects evaluate such sentences when no letters are completely disconnected, and at least one letter is connected every circle. Will anyone judge the sentence to be false in that scenario?

However, instead of just asking their subjects this question (a methodology that has previously falsified the theory) they asked subjects to click somewhere on a bar connecting the word "Yes" and the word "No." With this methodology, subjects did indeed pick points closer to "No" when the drawing included some letters that were connected to all of their circles.

So it seems that when pushed, subjects do start to doubt whether the word "some" should be taken to mean "at least one" or "at least one and not all." Once they notice this ambiguity, they might become slightly more scared of giving an unequivocal "Yes" and consequently click somewhere lower on the scale.

Or to put it differently, as soon as the subjects start fearing that they are in some kind of polemic language game, they switch to a safer strategy by committing to less. So a localist reading is indeed available, and in some circumstances a possible claim inherent in the sentence.

Relevance Considerations

However, what really caught my eye on this second rereading of the paper was the comments that Chemla and Spector make about relevance while discussing possible experimental set-ups:

the local reading (‘every square is connected with some of the circles and not with all of them’) is relevant typically in a context in which we are interested in knowing, for each square, whether it is connected with some, all, or no circle. Such a context would for instance result from raising the following question: ‘Which squares are connected to which circles?’. (Section 2.2.2, page 365)

The globalist reading, one might add, would be the most relevant answer to the question Which sqaures are connected to some circles? The only counterexamples to the globalist reading consist of sqaures that are not connected to anything, as stated above.

The reason I find this comment interesting is that it connects the meaning of the sentence with the expectation of the subjects. Experimental materials place the subject in some particular role, and this implicitly suggests certain answers to the big question: What does he expect me to do with this question?

Chemla and Spector obviously embrace some kind of objectivist perspective on semantics, with sentences having grammatically determined meanings, and a clear division between grammar and pragmatics. But their sensitivity to the perspective of the subject is very commendable and opens up the possibility of founding the notion of meaning of the notion of social context.

Hintikka: "What is elementary logic?" (1995)

The claim of paper is that independence-friendly logic is more "natural" that ordinary first-order logic. That is, the restriction to quantifiers with nested scopes is unnecessary and unfounded.
In this article, as in everything he has written, there are some serious linguistic issues with the examples he uses, and it is by no means clear that his own semantic intuitions are generalizable.

The paper is reprinted in an 1998 anthology of Hintikka's work, but Hintikka referred to the paper as "forthcoming" in 1991, and it was published for the first time in 1995.

The Old Example: Villagers and Townsmen

His old natural-language argument for the usefulness of independence-friendly logic comes from his introspective intuitions about the following sentence:

Some relative of each villager hates some relative of each townsman

This sentence has two readings, a classical and an independence-friendly. These readings can be distinguished by the following model: Suppose that there is one villager and one townsman, and that they are related to themselves and to each other; suppose further that they hate each other, but do not hate themselves.

The Verifier then instantiates some relative of each (= the only) villager by picking either the villager or the townsman, since everyone is related. The same goes for some relative of each (= the only) villager. The sentence is true exactly when the two choices are different, and not true when they are the same (since no one, per assumption, hate themselves).

When the two choices are independent, the Verifier has no winning strategy, and the sentence is thus not true. The Falsifier, on the other hand, also doesn't have a winning strategy, since some combinations of Verifier choices do in fact make the sentence true, and others don't. In the independent-friendly reading, the sentence is thus neither true nor false. In the classical reading, it's true.

Now Hintikka's claim is that the independence-friendly reading of this English sentence is a plausible reading (or the most plausible?). He does not give any empirical arguments for the claim.

Note that the same logical structure can be replicated with a slightly less far-fetched example:

A north-going driver and a south-going driver can choose to drive in a side of the road so that they avoid a collision.

If you think that this sentence is true, you have read it in the classical way. If you think it is false, you have read it in the independence-friendly way.

The New Example: The Boy that Loved the Girl that Loved Him

In support of his claim, he provides the following "perfectly understandable English sentence" as evidence (p. 10):

The boy who was fooling her kissed the girl who loved him.

He then claims that this sentence cannot be expressed in first-order logic "no matter how you analyze the definite descriptions" (p. 10).

So how can we analyze the definite descriptions? I guess we have at least the following options:

The boy₁ who was fooling her₂ kissed the girl₂ who loved him₁.
The boy₁ who was fooling her₂ kissed the girl₃ who loved him₁.
The boy₁ who was fooling her₂ kissed the girl₂ who loved him₄.
The boy₁ who was fooling her₂ kissed the girl₃ who loved him₄.

I suppose that the problematic case that he is thinking about is the first one. That's the one where the sentence implicitly states that the identity of the girl uniquely identifies a beloved boy, while the identity of the boy also uniquely identifies a fooled girl.

This is obviously a circular dependence, but can still meaningfully apply (or not apply) to various cases. For instance, if x fools y and y loves x, then it applies. If x fools y, and y loves z, or loves both x and y, then it doesn't.

But unlike the villagers-sentence, I can't see how this is not expressible in terms of first-order logic, given the usual legitimate moves in sentence formalization. But perhaps Hintikka has some strange and far-fetched "natrual" reading of the sentence in mind?

Subscribe to: Posts ( Atom )