Thursday, October 11, 2012

Bob van Tiel: "Embedded scalars and typicality" (2012)

Bob van Tiel has, as far as I understand, been arguing for a while that the various empirical problems surrounding scalar implicatures can be explained in terms of typicality. So the strangeness of saying that I ate some of the apples if I in fact ate all of them should be compared to the strangeness of saying there's a bird in the garden if there is in fact an ostrich in my garden.

This argument is nicely and succinctly presented in a manuscript archived at the online repository The Semantics Archive. It contains a fair amount of nice empirical data.

A Bibliography

First of all, the paper contains pointers to most of the interesting recent literature on the subject. Let me just liberally snip out a handful of good references that I either have read or should read:
This list should probably also include the following, which I still have to read:

Quantification According to van Teil

In sections 6 and 8 of the paper, van Teil suggests a very particular semantics for the use of some and any, both extracted from "goodness" ratings by 30 American subjects regarding the sentences All the circles are black and Some of the circles are black.

Semantics for All

His suggestion for the semantics of all is, loosely speaking, that the truth value V("all x are F") should be computed as the harmonic mean of the truth values of V("x1 is F"), V("x2 is F"), etc.

This obviously only makes sense for finite sets, but more strangely, it does not make sense if the truth value 0 occurs anywhere (since the harmonic mean involves a division). Consequently, he has to assume that V("x is black") = .1 when then x is white, and = .9 when x is black.

While this is not completely unreasonable, it does introduce yet another degree of freedom in his statistical fit (remember, he already chose the aggregation function himself), and it be a cause for some caution when interpreting his significance levels.

Semantics for Some

With respect to some, his suggestion is that the paradigmatic case of some circles are black is half of the circles are black. He thus sets the truth value V("some x are F") to be 1 minus the squared difference between the actual case and the half-of-the-individuals case. Ideally, this should give rise to truth value computation of the form
T(k) = 1 – (n/2 – k)2.
However, on the graph on page 17 of the paper, we can see that T(5) < 7 (7 being the maximal "goodness" level), so even when exactly half of the circles are black, we do not get maximal truth. This must be due to some additional assumption like the .9 parameter introduced above, but as far as I can see, he doesn't explain this anywhere in the paper.

One assumption he does make explicit is that
this definition is supplemented with penalties for the situations where the target sentence is unequivocally false (i.e., the 0 and 1 situations) (p. 18)
While these seems relatively innocuous as a general move, we should note that the situation in which exactly one circle is black counts as a counterexample to Some of the circles are black. It also seems to postulate to different mechanisms for evaluating a sentence: First comparing it to a prototype example, and then in addition checking whether it is "really" true. This extra postulation makes his typicality model lose a lot of its attraction, since it discreetly smuggles conventional truth-conditional semantics back into the system rather than superseding it.

Van Tiel's Comments on Chemla and Spector

While the rest of the paper is reasonably clear, there is one part that I do not understand. This is the part where van Tiel recreates the results from Chemla and Spector's letter-and-circle judgment task.

Here's what I do get: He says that the sentence used by Chemla and Spector,
Every letter is connected to some of its circles
suggests most strongly a some-but-not-all reading (labeled "Mixed"), less strongly an all reading, and least strongly a none reading. So however a subject rates the seven different pictures given by Chemla and Spector (0 to 6 connections), they should respect this constrain on appropriateness orderings.

But then van Tiel says the following:

Using Excel, I randomly generated 5,000 values for each of the three cases such that every triplet obeyed the constraint [that some suggests Mixed more than All, and All more than None]. For every triplet, I calculated the typicality value for the seven situations. Ultimately, I derived the mean from these values for comparison with the results of Chemla & Spector. The product-moment correlation between the mean typicality values from the Monte Carlo simulation and the mean suitability values found by Chemla & Spector was nearly perfect (r = 0.99, p < .001). This demonstrates that Chemla & Spector’s results can almost entirely be explained as typicality effects. (p. 19)
I don't get what it is that he is simulating here. Since he randomly generates triplets (not 7-tuples), the stochastic part must be the proposed "goodness" intuition of a random subject. But how does he go from those three numbers to assigning ratings to all seven cases? I suppose you could compute backwards from the three values to the parameter settings for the model discussed above, but that doesn't seem to be what he's doing. So what is he doing?

I think it would have made more sense to compute the theoretically expected truth value of Chemla and Spector's sentence directly now that he has just gone through such pains to construct a compositional semantics for some and every.

We have the number of connections for each picture, so we can compute the truth value of, say, The letter A is connected to some of its circles; and we also have, in each condition, the set of pictures, so we could compute the harmonic mean of these values for the six truth values that are presented to the subject. Why not do that instead if we really want to test the model?


  1. Hi Mathias, I came across your site when I was autogoogling. It's cool to see that you read my paper so thoroughly; I hope you don't mind if I address some of your criticisms.

    First of all, just to be clear, I'm not proposing a novel semantics for the existential and universal quantifier. The definitions I gave serve as a model for participants' typicality judgements for quantified statements, and these are quite distinct from their truth conditions. To illustrate this distinction with a perhaps more intuitive example, participants consider a sentence like "This is a bird" a better description of a robin than, for instance, of a duck. Nonetheless, this difference isn't part of the truth conditions of the sentence, as I'm sure you agree. Similarly, while participants consider "Every circle is black" more suitable as a description of a situation with eight (out of ten) black circles than of a situation with three black circles, I'd say the sentence is false simpliciter in both situations.

    Apparently, the details of the simulation were a bit obscure (incidentally, I've since uploaded a new version which should clarify things and solve the technical glitches you noticed). The typicality of a situation with respect to "Every letter is connected to some of its circles" equals the harmonic mean of the typicality values of the letters with respect to the predicate "is connected to some of its circles." We've got three types of letters, namely those connected to some but not all of their circles (Mixed cases), those connected to all of them (All cases) and those connected to none of them (None cases). We don't exactly know how typical these are of the predicate, but we do know, based on the typicality structure of "some", that Mixed cases are more typical than All cases, which are more typical than None cases. I therefore generated 5,000 triples of possible values based on this constraint, and for each triplet the typicality of each of the seven situations. For instance, if the simulation yielded typicality values of .1, .4, and .7 for None, All, and Mixed cases, the typicality of a situation with four All cases and two Mixed cases would be the harmonic mean of {.4, .4, .4, .4, .7, .7} = .47. Lastly, I took the means of the typicality values for each situation, and compared those with Chemla & Spector's results.

    Hope this clarifies some of the issues you put forward.

  2. Hi Bob!

    Thanks for the comment -- that does clear things up a bit.

    It also confirms the thing that I was a little puzzled about, namely, that the parameters of your model were the scores assigned to the "Mixed," "Some," and "All" cases.

    Fitting the model this way makes it statistically indistinguishable from Chemla and Spector's model. Neither model can distinguish scenes _within_ the "Mixed" category, and both models have three parameters that can be fitted. The uncertainty will only regard an interpretation: are the parameters "accessibility" values or "typicality" values? But this question is not answered by the data.

    On the other hand, if you used a typicality function T with a single parameter \theta, the models _would_ be distinguishable. You could then fit the parameter and use the maximum likelihood function T* to compute the typicality of the situation with 0, 1, 2, 3, 4, 5, or 6 six black circles.

    This way, the two models could potentially have unequal degrees of fit, because one model could assign different values to different "Mixed" cases, while the other one would have to assign them all the same value. So if you compared the likelihood of those two models given the data, you could potentially come up with some amount of support for your model over Chemla and Spector's.

    Isn't that true? Or am I misunderstanding the purpose of the experiment?