Notebooks on Language: Miller: "The Magical Number Seven, Plus or Minus Two" (1956)

Tuesday, May 12, 2015

Miller: "The Magical Number Seven, Plus or Minus Two" (1956)

This classic paper was based on a lecture George Miller gave in 1955. It explains his thoughts on the limits of human information processing and the mnemonic techniques we can use to overcome them.

There are several .html and .doc transcriptions of the text available online. Scans of the original text as it appeared in the Psychological Review can also be found here, here, here, here, here, and here.

The Rubbery Transmission Rate

Miller opens the paper by considering a number of experiments that suggest that people can distinguish about four to ten objects when they only vary on a single dimension. He reports, for instance, on experiments with sounds that differ in volume, chips that differ in color, and glasses of water that differ in salt content.

Miller's Figures 1, 2, 3, and 4 (pp. 83, 83, 85, and 85)

These results suggests a deeply rooted human trait, he submits:

There seems to be some limitation built into us either by learning or by the design of our nervous systems, a limit that keeps our channel capacities in this general range. On the basis of the present evidence it seems safe to say that we possess a finite and rather small capacity for making such unidimensional judgments and that this capacity does not vary a great deal from one simple sensory attribute to another. (p. 86)

The problem with this claim is that the actual information content differs hugely depending on what kinds of items you remember: Five English words add up to 50 bits, seven digits amount to 23 bits, and, naturally, eight binary digits are 8 bits. So something is clearly going wrong with this hypothesis:

For example, decimal digits are worth 3.3 bits apiece. We can recall about seven of them, for a total of 23 bits of information. Isolated English words are worth about 10 bits apiece. If the total amount of information is to remain constant at 23 bits, then we should be able to remember only two or three words chosen at random. (p. 91)

But this uniformity is of course not what we observe.

Dits and Dots

To work around the aporia, Miller introduces an ad hoc concept:

In order to capture this distinction in somewhat picturesque terms, I have fallen into the custom of distinguishing between bits of information and chunks of information. Then I can say that the number of bits of information is constant for absolute judgment and the number of chunks of information is constant for immediate memory. The span of immediate memory seems to be almost independent of the number of bits per chunk, at least over the range that has been examined to date. (pp. 92–93)

For example:

A man just beginning to learn radio-telegraphic code hears each dit and dah as a separate chunk. Soon he is able to organize these sounds into letters and then he can deal with the letters as chunks. Then the letters organize themselves as words, which are still larger chunks, and he begins to hear whole phrases. … In the terms I am proposing to use, the operator learns to increase the bits per chunk. (p. 93)

Something for Nothing

Miller goes on to report on an experiment carried out by someone named Sidney Smith (not included in the bibliography). By teaching himself to translate binary sequences into integers, he managed to push up his memorization ability to about 40 binary digits. Miller comments:

It is a little dramatic to watch a person get 40 binary digits in a row and then repeat them back without error. However, if you think of this merely as a mnemonic trick for extending the memory span, you will miss the more important point that is implicit in nearly all such mnemonic devices. The point is that recoding is an extremely powerful weapon for increasing the amount of information that we can deal with. (pp. 94–95)

That's clearly true, but Miller gives no hint as to what the relationship is between this chunking technique and the information-theoretical concepts with which he started.

Mathematically speaking, an encoding is a probability distribution, and a recoding is just a different probability distribution. Learning something does not magically increase your probability budget; the only way you can make some codewords shorter is to make others longer. Simply mapping random binary expansions to equally random decimal expansions should not make a difference. So it's hard to see what the connection between bits and chunks should be.

Notebooks on Language

Tuesday, May 12, 2015

Miller: "The Magical Number Seven, Plus or Minus Two" (1956)

The Rubbery Transmission Rate

Dits and Dots

Something for Nothing

No comments :

Post a Comment