Wednesday, May 28, 2014

Herdan: The Advanced Theory of Language as Choice and Chance (1966)

This really odd book is based to some extent on the earlier Language as Choice and Chance (1956), but contains, according to the author, a lot of new material. It discusses a variety of topics and language statistics, often in a rather unsystematic and bizarrely directionless way.

The part about "linguistics duality" (Part IV) is particularly confusion and strange, and I'm not sure quite what to make of it. Herdan seems to want to make some great cosmic connection between quantum physics, natural language semantics, and propositional logic.

But leaving that aside, I really wanted to quote it because he so clearly expresses the deterministic philosophy of language — that speech isn't a random phenomenon, but rather has deep roots in free will.

"Human Willfulness"

He thus explains that language has one region which is outside the control of the speaker, but that once we are familiar with this set of restrictions, we can express ourselves within its bounds:
It leaves the individual free to exercise his choice in the remaining features of language, and insofar language is free. The determination of the extent to which the speaker is bound by the linguistic code he uses, and conversely, the extent to which he is free, and can be original, this is the essence of what I call quantitative linguistics. (pp. 5–6)
Similarly, he comments on a study comparing the letter distribution in two texts as follows:
There can be no doubt about the possibility of two distributions of this kind, in any language, being significantly different, if only for the simple reason that the laws of language are always subject, to some extent at least, to human willfulness or choice. By deliberately using words in one text, which happen to be of rather singular morphological structure, it may well be possible to achieve a significant difference of letter frequencies in the two texts. (p. 59)
And finally, in the chapter on the statistical analysis of style, he goes on record completely:
The deterministic view of language regards language as the deliberate choice of such linguistic units as are required for expressing the idea one has in mind. This may be said to be a definition in accordance with current views. Introspective analysis of linguistic expression would seem to show it is a deterministic process, no part of which is left to chance. A possible exception seems to be that we often use a word or expression because it 'happened' to come into our mind. But the fact that memory may have features of accidental happenings does not mean that our use of linguistic forms is of the same character. Supposing a word just happened to come into our mind while we are in the process of writing, we are still free to use it or not, and we shall do one or the other according to what is needed for expressing what we have in mind. It would seem that the cause and effect principle of physical nature has its parallel in the 'reason and consequence' or 'motive and action' principle of psychological nature, part of which is the linguistic process of giving expression to thought. Our motive for using a particular expression is that it is suited better than any other we could think of for expressing our thought. (p. 70)
He then goes on to say that "style" is a matter of self-imposed constraints, so that we are in fact a bit less free to choose our words that it appears, although we ourselves come up with the restrictions.

"Grammar Load"

In Chapter 3.3, he expands these ideas about determinism and free will in language by suggesting that we can quantify the weight of the grammatical restrictions of a language in terms of a "grammar load" statistic. He suggest that this grammar load can be assessed by counting the number of word forms per token in a sample of text from the language (p. 49). He does discuss entropy later in the book (Part III(C)), but doesn't make the connection to redundancy here.

Inspects an English corpus of 78,633 tokens, he thus finds 53,102 different forms and concludes that English has a "grammar load" of
53,102 / 78,633 = 67.53%.
Implicit in this computation is the idea that the number of new word forms grows linearly with the number of tokens you inspect. This excludes sublinear spawn rates such as
In fact, vocabulary size does seem to grow sublinearly as a function of corpus size. The spawn rate for new word forms in thus not constant in English text.

Word forms in the first N tokens in the brown corpus for N = 0 … 100,000.

However, neither of the growth functions listed above fit the data very well (as far as I can see).

No comments :

Post a Comment