Notebooks on Language: categorization

Showing posts with label categorization. Show all posts

Monday, November 5, 2012

Augustine: De Dialectica

Around the year 387, Saint Augustine wrote this little text on logic, spanning only about 20 pages. According to his own account in Retractationes, the book was never finished, and he lost his only copy of the manuscript. However, the text we have genuinely seems to be written by him.

In spite of its opening statement, "Dialectic is the science of disputing well" (p. 5/82), De Dialectica does not contain much that we would now recognize as logic. It's a discussion of a number of topics related to language, most notably ambiguity and etymology.

Truth Values and Dispute

One notable feature of Augustine's discussion of 'dialectics' is that he seems to take dispute to be more fundamental than truth values. A meaningful statement has a truth value in virtue of being up for discussion – not the other way around.

In his words:

For either a statement is made in such a way that it is held to be subject to truth or falsity, such as 'every man is walking' or 'every man is not walking' and others of this kind. Or a statement is made in such a way that, although it fully expresses what one has in mind, it cannot be affirmed or denied, as when we command, wish, curse, or the like. For whoever says 'go into the house' or 'oh that he would go into the house' [utinam pergat ad villam] or 'may the gods destroy that man' cannot be thought to lie or to tell the truth, since he did not affirm or deny anything. Such statements do not, therefore, come into question so as to require anyone to dispute them. (p. 6/85)

He consequently adopts the term "statements that require disputation" as a name for what we would call truth-functional statements (p. 6/85).

Eloquence and Proloquence

He later introduces the distinction "expressing" / "asserting" (eloquendo / proloquendo) to indicate the difference between the statements that "require questioning and disputing" and those that do not (p. 7/87).

This leads him, in the Chapter XII on "the force of words," to make he following wonderful comment on the relation between logic and rhetoric:

For although disputation need not be inelegant [ineptam] and eloquence need not be deceptive [mendacem], still in the former the passion of learning often – indeed, nearly always – scorns the pleasures of hearing, while the in the latter the more ignorant multitude [imperitior multitudo] think that which is said elegant is said truly. Therefore, when it becomes apparent what is proper to each, it is clear that a disputer who has any concern to make his points appealing will sprinkle them with rhetorical color, and an orator who wishes to convince people of the truth will be strengthened by the sinews and bones, as it were, of dialectic, which are indispensable to the strength of the body but are not allowed to become visible to the eye. (p. 13–14/103)

So logic and rhetoric are inner and outer values – but logic is not inner as in the soul, but inner as in internal organs.

An Observation on Implication

Another interesting feature is that he takes implication to be inherently connected to argumentation:

Whoever says 'if he is walking, he is moving' wishes to prove something, so that when I concede that this combined statement is true he only needs to assert that he is walking and the conclusion that he is moving follows and cannot be denied, or he need only assert that he is not moving and the conclusion that he is not walking must be agreed to. (p. 6/85)

It seems fair to say that Augustine thus sees the meaning of the implication as given by its use in argumentation.

Signification and Writing

In Chapter V, Augustine gives a definition of a sign followed by a slightly strange qualification:

A sign is something which is itself sensed and which indicates to the mind something beyond the sign itself. To speak is to give a sign by means of an articulate utterance. By an articulate utterance I mean one which can be expressed in letters. [Signum is quod et se ipsum sensui et praeter se aliquid animo ostendit. Loqui est articulata voce signum dare. Articulatum autem dico quae comprehendi litteris potest.] (p. 7/87)

The intuition behind this comment seems to be the following: If something is said clearly and intelligibly, it can be broken up into its component parts (letters, or phonemes). However, this does seem on he face of it to make verbal understanding dependent on literary understanding.

But maybe this is only because we read too much into the word "letter":

For we misuse the term 'letter' when we call what we see written down a letter, for it is completely silent and is no part of an utterance but appears as the sign of an articulate utterance. In the same way [we misuse the term 'word'] when we call what we see written down a word, for it appears as the sign of a word, that is, not as a word but as the sign of a significant utterance. Therefore, as I said above, every word is a sound [omne verbum sonat]. (p. 7/89)

The theory thus seems to be this: The written word or letter is a sign because it evokes the spoken word or letter to the mind; and the spoken word or letter is a sign because it evokes its referent.

Ambiguity and Obscurity

In Chapter VIII, Augustine introduces a distinction between ambiguity and obscurity. This is not terribly important, but I find his explanation so nice that I wanted to quote it:

When little appears, obscurity is similar to ambiguity, as when someone who is walking on a road comes upon a junction with two, three, or even more forks of the road, but can see none of them on account of the thickness of a fog. Thus he is kept from proceeding by obscurity. […] When the sky clears enough for good visibility, the direction of all the roads is apparent, but which is to be taken is still in doubt, not because of any obscurity but solely because of ambiguity. (p. 14/105)

He goes on to complicate this distinction by distinguishing further between obscurity based on inaccessibility to the mind and to the senses, as in not recognizing a picture of and apple either because one has never seen an apple before, or because it is too dark (p. 14/105).

Problems with Category Membership

In his discussion of ambiguity, Augustine distinguishes between the vagueness of a word like man and more straightforward cases of homonomy. He calls these two phenomena univocal and equivocal meaning, respectively.

This would not in itself be particularly interesting if he didn't get himself into problems by suggesting that a univocal concept is characterized by having "a single definition" (p. 16/111). This of course raises some problems once we start looking for such a definition:

When we speak of a man we speak equally of a boy and of a young man and of an old man, equally of a fool and of a wise man [and a number of further examples]. Among all those expressions there is not one which does not accept the name 'man' in such a way as to be included by the definition of man. For the definition of 'man' is 'a rational, mortal animal' [animal rationale mortale]. Can anyone say that only a youth is rational, mortal animal and not also a boy or an old man, or that only a wise man is and not only a fool? (p. 16–17/111)

So in order to save his definition, Augustine has to assert that a fool is rational, something he seems to sense the problem with:

One may wonder how a boy who is small and stupid [parvo aut stulto], or at least silly [fatuo], or a man who is sleeping or drunk or in a rage, can be rational animals. This can certainly be defended, but it would take too long to do this because we must hasten on to other subjects. (p. 17/111)

This is approximately the same rhetorical strategy he used when defining a sign back in Ch. V:

Whether all these things that have been defined have been correctly defined and whether the words used in definition so far will have to be followed by other definitions, will be shown in the passage in which the discipline of defining is discussed. [This part was never written.] For the present, pay strict attention to the material at hand. (p. 7/87)

Criticism of the Stoic Theory of Etymology

In addition to being an interesting text in its own right, Augustine's tiny book is also one of our prime sources for the Stoic theory of where meaning comes from.

The upshot of this theory is apparently the following: Every word has a meaning which derived metonymically from another word, and ultimately, these chains of metonymies all point back towards an original sound iconicity. Thus, Augustine reports that in order to avoid infinite regress,

… they assert that you must search until you arrive at some similarity of the sound of the word to the thing, as when we say the 'the clang of bronze' [aeris tinnitum], 'the whinnying of horses' [equorum hinnitum], 'the bleating of sheep' [ovium balatum], 'the blare of trumpets' [tubarum clangorem], 'the rattle of chains' [stridorum catenarum]. For you clearly see that these words sound like the things themselves which are signified by these words. But since there are things which do not make sounds, in these touch is the basis for similarity. If the things touch the sense smoothly or roughly, the smoothness or roughness of letters in like manner touches the hearing and thus has produced the names for them. For example, 'lene' [smoothly] itself has a smooth sound. Likewise, who does not by the name itself judge 'asperitas' [roughness] to be rough? It is gentle to the ears when we say 'voluptas' (pleasure); it is harsh when we say 'crux' (cross). This the words are perceived in the way the things themselves affect is. Just as honey itself affects the taste pleasantly, so its name, 'mel,' affects the hearing smoothly. 'Acre' (bitter) is harsh in both ways. Just as the words 'lana' (wool) and 'verpres' (brambles) are heard, so the things themselves are felt. The Stoics believed that these cases where the impression made one the senses by the sounds are, as it were, the cradle of words. From this point they believed that the license for naming had proceeded to the similarity of things themselves to each other. (p. 10/95)

Augustine's main beef with this theory seems that it is too speculative:

Even though it is is a great help to explicate the origin of a word, it is useless to start on a task whose prosecution could go on indefinitely. For who is able to discover why anything is called what it is called? (p. 9/93)

As an example, he gives a couple of hypotheses about the origin of the word verbum, asking "But what difference does this make to us?" (p. 9/93).

Varieties of Metonymic Shifts

The avenues by which words can jump from meaning to meaning are quite diverse. Twice in the text, Augustine gives a list of relationships that can warrant metonymic slides, once in chapter on "the origin of words" (Ch. VI) and once in the chapter on "equivocation" (Ch. X).

Here's the list from Chapter VI, page 11/97:

Proximity [vicinitas] is a broad notion which can be divided into many aspects:

from influence, as in the present instance in which an alliance [foedus] is caused by the filthiness of the pig [foeditate porci];

from effects, as puteus [a well] is named, it is believed, from its effect, potatio [drinking];

from that which contains, as urbs [city] is named from the orbis [circle] which was by ancient custom plowed around the area […];

from that which is contained as it is affirmed that by changing a letter horreum [granary] is named after hordeum [barley];

or by transference [abusionem], as when we say horreum, and yet it is wheat that is preserved here;

or the whole from the part, as when we call a sword by the name 'mucro' [point], which is the terminating part of the sword;

or the part from the whole as when capillus [hair] is named from capitis pilus [hair of the head].

Here's the list from Chapter X, page 19/117–119:

I call it transference [translatione]

when by similarity [similitudine] one name is used of many things, as both the man, renowned for his great eloquence, and his statue can be called 'Tillius.'

Or when the part is named from the whole, as when his corpus can be said to be Tillius;

or the whole from the part, as when we call whole houses 'tecta' [roofs].

Or the species from the genus, for 'verba' is used chiefly of all the wors by which we speak, although the words which we decline by mood and tense are named 'verba' in a special sense.

Or the genus from the species as 'scholastici' [scholars] were originally and properly those who were still in school, though now all who pursue a literary career [litteris vivunt] use this name.

Or the effect from the cause, as 'Cicero' is a book of Cicero's.

Or the cause from the effect, as something is a terror [terror] which causes terror.

Or what is contained from the container, as those who are in a house are called a household.

Or vice versa, as a tree is called a 'chestnut.'

Or if any other manner is discovered in which something is named by a transfer, as it were, from the same source.

You see, I believe, what makes for ambiguity in a word.

The itemization is not in the original. It is interesting that many of these examples are slightly strange or would be analyzed differently (but equally speculatively) today; the relationship of a chestnut tree and a chestnut would, e.g., probably be seen as producer–product relation rather than container–contained.

Word, Thing, Concept, and Word-Thing

One last thing that I want to mention is the rather complicated four-part distinction that Augustine introduces in chapter VI between verbum, dicibile, dicto, and res.

The last tree can roughly be glossed as concept, word, and thing:

Now that which the mind not the ears perceives from the word and which is held within the mind itself is called a dicibile. When a word is spoken not for its own sake but for the sake of signifying something else, it is called a dictio. The thing itself which is neither a word nor the conception of a word in the mind [verbi in mente conceptio], whether or not it has a word by which it can be signified, is called nothing but a res in the proper sense of the name. (p. 8/89)

The verbum, however, is a word considered as a thing one can refer to:

Words are signs of things whenever they refer to them, even though those [words] by which we dispute about [things] are [signs] of words. […] When, therefore, a word is uttered for its own sake, that is, so that something is being asked or argued about the word itself, clearly it is the thing which is the subject of disputation and inquiry; but the thing in this case is called a verbum. (p. 8/89)

We thus have here a kind of use/mention distinction, although put in a slightly different vocabulary.

Monday, June 18, 2012

Barsalou: "The instability of graded structure" (1987)

Lawrence Barsalou summarizes the findings from a large number of categorization experiments performed by him and his colleagues 1980-86. The upshot is that prototypicality effects are highly unreliable and context-sensitive.

The Instability of Prototype Effects

Barsalou describes the following types of variability between judgments of prototypicality:

Pages 108-109: Contrary to what Rosch claims, the between-subject reliability of similarity judgments is generally low, with an average correlation of around 50%. Rosch (1975) and Armstrong, Gleitman, and Gleitman (1983) only achieve their 90% correlation coefficients by comparing group averages rather than individuals.
Pages 109-111: People differ in their similarity judgments -- students, for instance, differ from professors. Yet, people are surprisingly good at simulating the similarity estimates of other people when asked to make similarity judgments from their point of view.
Pages 111-112: People are not very stable in their judgments. Their self-correlation is about 92% after one hour, 87% after one day, and around 80% after one week or more.

I would imagine that the task about "taking the point of view" of someone else is highly sensitive to tiny details in the test materials. Barsalou does not discuss the methodology in detail, but I can see that the 1984 research report that describes the experiment is available at his website.

The Alternative Theory

In order to fix the theoretical problems that result from this instability of prototype effects, Barsalou suggests that concepts should be seen as "temporary constructs in working memory that are tailored to current situations" (p. 120). This does indeed allow for a more anarchistic type of conceptual system, but it is also a very weak theory (i.e., it is consistent with almost all types of behavior).

My suspicion is here -- as often when "representations" start to disintegrate into a disorderly pile of highly unstructured improvisations -- that the whole set-up somehow captures the wrong phenomenon. This can be true both of wild cognition (how do you decide, in conversation, whether or not to call something, say, a tool?) and in the experimental situation (how do subjects construe the questions they get, and what do they think that the experimenter wants them to do?).

The context-sensitivity seems, I think, to suggest that categorization is a fairly "high-level" type of cognition, in spite of the claims of Rosch and others. People might just use quite intelligent, deliberate, and context-sensitive strategies for picking words, finding or ignoring similarities, and the like. But I realize this is pretty weak, too.

Thursday, June 14, 2012

Lakoff: "Cognitive models and prototype theory" (1987)

Lakoff argues that the prototype effects exist because concepts are defined by several partly overlapping "cognitive models." A possible example of a mother might then fit some of the models (the "genetic mother") and not others ("the wife of the father"). This may lead to graded membership judgments.

Structure of the Paper

The paper is a little bit difficult to navigate in, as it contains a large number of sections which are all on the same level of organization.

The sections differ in content and length. Their headings are:

Untitled introduction
Interactional properties
Cognitive models
Graded models
The idealized character of cognitive models
Cognitive models versus feature bundles
Mother
Metonymic models
Metonymic sources of prototype effects
The housewife stereotype
Working mothers
Radial structures
Some kinds of metonymic models
Social stereotypes
Typical examples
Ideals
Paragons
Generators
Submodels
Salient Examples
Radial categories
Japanese hon
Categories of mind, or mere words
What is prototype theory
The core + identification proposal
Osherson and Smith
Armstrong, Gleitman, and Gleitman
Conclusion

Cognitive Clusters

His own theory on cognitive models is most clearly explained in section 6 and 7 (pp. 66-70). His idea is that concepts are defined in terms of a cluster of competing models -- for instance, a salient example, an idealized picture, and a positive paradigm.

The effect seems oddly close to the weighted lists of attributes used by, e.g., Linda Koleman and Paul Kay (1981). But he insists that bundles of cognitive models are empirically distinguishable from bundles of features. He refers to Eve Sweetser (1987) for support of this claim, but does not discuss the evidence.

Sections 14 through 20 (and perhaps section 21?) are intended to give examples of how "cognitive models" can look. They can look like a lot of different things, it appears.

An Alternative Theory

Sections 25, 26, and 27 criticize a "reactionary" counterproposal.

According to this theory, category judgments can be made in two ways, by deliberation or by quick-and-dirty heuristics. Prototype effects are then, as far as I understand, only present when the heuristic method is used.

Lakoff identifies a comment from his own 1972 paper on hedges (journal version, 1973) as the inspiration for this new theory. He finds this "ironic."

The arguments he proposes against this two-method theory oddly resembles arguments in favor of it (e.g., p. 92). His main concern, if I understand it correctly, is the metaphysical assumptions of the theory rather than any specific empirical problem.

Rosch: "Principles of Categorization" (1978)

Elanor Rosch's contribution to Cognition and Categorization really consists of two independent parts: An overview over her experiments investigating basic-level categories (pp. 30-35), and an overview of her experiments with prototype effects (pp. 35-41). In will only deal with the first part now.

Cue Validity

The most central concept in Rosch's discussion of basic-level categories is the notion of the cue validity. This is defined for a category such as "bird," which is more or less reliably identified by cues such as "wings." She explains:

The cue validity of an entire category may be defined as the summation of the cue validities for that category of each of the attributes of the category. (pp. 30-31)

This immediately raises two questions:

Do all cues count in the summation with equal weight? There are infinitely many possible cues and only a few highly valid ones. This suggests that more explicit assumptions about "salience" are needed.
With what weight do the various members of a category contribute to the average? Equally? Weighted by the frequency of the linguistic label? Weighted by the frequency of the thing?

While these questions may seem like technical remarks, they do in fact relate to some deeper issues that I will mention below.

The Ambiguity of "Basic"

There are two competing characterizations of "basic" in Rosch's work, an ostensive and a perceptual. It's not always clear which one she is taking as definitive, and this sometimes introduces problems.

Both definitions apply to concept trees and are meant to pick out a particular depth in such a tree. They do so by locating the level of abstraction at which either

the categories "car," "chair," "tomato," and "hammer" are found; or
average cue validity is maximized.

My worry is that her cross-cultural, developmental, and evolutionary claims may turn out to be tautologies when we look closer at the ups and downs of her theory.

For instance, if the "basic" means "maximal cue validity," then of course children learn names from this level first. On the other hand, if Rosch gets to pick what counts as "basic" in each branch of the English category system ("chair," "car," "tomato," ...), then she can obviously just pick the level that fulfills the second definition.

Learned Perception

The fact that she might unknowingly be making the tautological point that "normal things are normal" is hinted at when she comments that English-speakers tend to be less able to distinguish between plants than the ostensive definition suggests.

This is observation was echoed more recently my Jerome Feldman:

For many city dwellers, tree is a basic category—we interact the same way with all trees. But for the professional gardener, tree is definitely a superordinate category (Feldman 2006: 186)

With those kinds of qualifications, basic level categories will certainly guaranteed to have all of the properties that Rosch claims. But any claim about their universal centrality will also become an empty verbalism.

Note how this also ties in with the sticky issue of trained perception:

One influence on how attributes will be defined by humans is clearly the category system already existent in the culture at a given time. This our segmentation of a bird's body such that there is an attribute called "wings" may be influenced not by perceptual factors [...] but also by the fact that at present we already have a cultural and linguistics category called "birds." (p. 29)

She is apparently aware of this problem, but not willing to face the implication that complex cues like plumage are themselves categories that are open-ended and ambiguous.

Mutual Dependence and Iterated Learning

She does note, however, that attributes might be extracted from categories just as well as categories might be based on attributes. However:

Unfortunately, to state the matter in such a way is to provide no clear place at which we can enter the system as analytical scientists. What is the unit with which to start our analysis? (p. 42)

To me, this suggests a game-theoretical analysis. A category system is invented by people, but also has to be transmitted; fixed points in such an iterated learning process will be the systems that trade off difficulty of acquisition for pragmatic necessity, I guess.

This process could probably be modeled relatively easily in a multi-agent system with a set of Bayesian learners. However, such a model will probably be highly sensitive to the assumptions made about the environment of learning (e.g., the frequency of birds and the frequency of winged-ness).

Tuesday, June 12, 2012

Rosch and Lloyd: Cognition and Categorization (1978)

Read this book, and you will understand everything about where cognitive semantics comes from, and how it sees itself. I feel like quoting the whole thing word for word.

All the seeds of future greatness and future crises are visible here – as well a firm rooting in the AI and cognitive psychology of the golden age of frog neurons and cat retinas in the 1950s.

Take a look, for instance, at the names of some of the contributors: Brent Berlin, Elanor Rosch, Amos Tversky, George Miller – quite a cast. Naomi Quinn and Dan Slobin, too, are involved in the background, as members of Social Science Research Council's Committee on Cognitive Research – the Council being an institution that "the book reflects the aims of" (p. vii).

Revolution!

The trumpets are already out in the blurb on the flap, with the promise of "a conceptual revolution overtaking the study of language and cognition." A more detailed narrative is unfolded in the preface:

In the spring of 1976, a small group of psychologists, linguists, and anthropologists met at Lake Arrowhead, California, in a conference sponsored by the Social Science Research Council to discuss the nature and principles or category formation. Participants coming from the East Coast talked about Roger Brown's memorial lecture for Eric Lenneberg given a few days earlier. (p. vii)

Note the literary voice – "Four score and seven years ago ..." And then an indirect reference to Eric Lenneberg, just to to put some distance to the "recalcitrant cultural relativists" that Berlin grumbles about (p. 12).

The preface continues:

Roger Brown had chosen had chosen to speak about the new paradigm of reference using research in the domain of color. But research in fields such as ethnoscience, perception, and developmental psychology was beginning to appear and might also have been cited to support the claim that categorization, rather than being arbitrary, may be predicted and explained. (p. vii)

The steady stride of scientific progress, in other words. No corner of the life of "man" will evade the searchlight of scientific attention.

"Scientists now realize…"

The boogieman is also largely the same as in 1960 – behaviorism, empiricism, relativism. The Introduction thus states:

In the stimulus–response learning paradigm that dominated American psychology in the first half of the twentieth century, both the stimulus and the response were dealth [sic] with as arbitrary systems; the focus was on primarily on the connection between them. In developmental psychology, children were considered beings born into a culture in which categories and stimuli were already determined by the adult world. Anthropology, which might have sought universal principles of human experience, under the influence of Boazian [sic] cultural relativism, concentrated on cultural diversity and the arbitrary nature of the definition of categories. (p. 2)

However, somewhat confusingly, the over-rationalizing "Aristotelian" picture of knowledge is also wrong:

If other thought processes such as imagery, ostensive definition, reasoning by analogy to particular instances, or the use of metaphors were considered at all, they were usually relegated to lesser beings such as women, children, primitive people, or even to nonhumans. (p. 2)

While all of these are true observations, to me they look like a motivation for something other than a research program hailing "universal principles" and biological reductionism.

Rosch's Afterthought

But maybe this should just one of the germs of contradiction in "second-generation" cognitive science. Rosch's abrupt change of attention in the very last part of her paper certainly seems to say so.

There, in the section "The Role of Objects in Events," she falls into an almost Heideggerian mode of thought, contemplating the "events of daily human life" and the "flow of experience" (p. 43).

Not for long, though. Soon, she gets the idea of treating everyday life events according to the same principles as she had applied to chairs and cars and vegetables (p. 44). So there we are, back in familiar territory.

Tversky: "Features of Similarity" (1977)

Tversky argues that objects should be represented as feature bundles, and that the similarity of the feature bundles equals the measure of their overlap minus the measures of the two disjoint parts.

He provides large amounts of evidence that this more versatile (and vague) scheme is a necessary corrective to the metric conception of similarity.

Finding the Relevant Dimensions

The general idea is "feature matching," a pragmatic process relying on background notion of relevance:

When faced with a particular task (e.g., identification or similarity assessment) we extract and compile from our data base a limited list of relevant features on the basis of which we perform the required task. (p. 329)

This process can violate metric properties because the basis for the similarity may be different in different cases:

Jamaica is similar to Cuba (because of geographical proximity); Cuba is similar to Russia (because of their political affinity); but Jamaica and Russia are not similar at all. (p. 329)

This sounds like Wittgenstein, and in the last section of the paper, he does in fact get a citation, during a discussion of Elanor Rosch's work (p. 348).

Symmetry and Reversibility

Symmetry, too, is problematic. Some prototypical examples of certain categories seem to make the central features of the category shine brightly and thus attract attention; this produces higher similarity judgements:

We tend to select the more salient stimulus, or the prototype, as a referent, and the less salient stimulus, or the variant, as a subject. We say "the portrait resembles the person" rather than "the person resembles the portrait." We say "the son resembles the father" rather than "the father resembles the son." (p. 328)

However, in certain cases, both objects may have the stereotypical character of a paradigm case:

Sometimes both directions are used but they carry different meanings. "A man is like a tree" implies that man has roots; "a tree is like a man" implies that the tree has a life history. "Life is like a play" says that people play roles. "A play is like life" says that a play can capture the essential elements of human life. (p. 328)

Whether or not Tversky selects the right features here is doubtful. But his point about feature selection is in general true, I suppose.

Context-Dependence

A large part of Tversky's paper is dedicated to compiling evidence against the symmetry of similarity judgments, and to showing prototype effects. This part of the paper is slightly dated, especially since he does not reprint any of his data, only the test statistics.

However, his examples of context-dependent similarity (pp. 340-344) are more interesting from a contemporary perspective. These include for instance the experiments in which he asked subjects to split a set of four objects into two pairs. This indirectly pointed to the context-sensitivity of feature selection.

One way he did this was by asking people to pick the a cartoon drawing of a face according to similarity. So his subjects would get a neutral face and a set of three frowning or smiling faces with an instruction to pick the face most similar to the neutral one:

As the numbers indicate, the members of the reference set mattered hugely for the judgment of the leftmost and rightmost face, even though these were held constant across the two conditions.

This seems to suggest that merely having two frowney or smiley faces in the reference set implicitly tells the subjects frowns or smiles are essential, stables features, rather than facial expressions drawn from a random distribution. A neutral face will then have a much smaller likelihood of coming from that "category."

The other "category," however, only contains a single example and thus yields higher likelihood levels, as it suggests that more variance within the category might have been possible.

Reformulations

If we set <a,b,c> = <neutral, frown, smile> and <d,e> = <dot-eye, circle-eye>, then the data set can be rephrased as follows:

Which of the following three pairs is most similar to <a,d>?
Condition 1: <b,d>, <c,e>, or <c,d>?
Condition 2: <b,d>, <b,e>, or <c,d>?

Notice that we get a symmetry here: If we swap the names b and c and rearrange the items, the two sets turn out in fact to be the same. Yet, we don't see symmetric choices.

I wonder how abstractly this prompt could be presented to a subject and produce results like those Tversky got. Imagine for instance the following formulation:

Which of the following is most similar to a white mouse?
Condition 1: a black mouse, a brown rat, or a brown mouse?
Condition 2: a black mouse, a black rat, or a brown mouse?

Whatever the answer is, there is a problem with this design, as it puts too much weight on forced partitioning of the four faces. A better method is used in the experiment reported on page 344. This can essentially be thought of as the following three conditions:

Condition 1:
How similar is Chile to Venezuela?
How similar is Guatemala to Uruguay?
(etc.)

Condition 2:
How similar is Sweden to Norway?
How similar is Finland to Denmark?
(etc.)

Condition 3:
How similar is Chile to Venezuela?
How similar is Sweden to Norway?
(etc.)

With this set-up, Tversky reports to have found a higher average similarity in (what corresponds to) condition 3. This is explained by the fact that the context foregrounds the geographical region as a cue in that case, but not in the two others.

Tenenbaum and Xu: "Word Learning as Bayesian Inference" (2007)

Joshua Tenenbaum and Fei Xu report some experimental findings with concept learning and simulate them in a computational model based on Bayesian inference. It refers to a 1999 paper by Tenenbaum for mathematical background.

Elements of the Model

The idea behind the model is that the idealized learner picks a hypothesis (a concept extension, a set of objects) based on a finite set of examples. In their experiments, the training sets always consist of either one or three examples. There are 45 objects in the "world" in which the learner lives: some vegetables, some cars, and some dogs.

As far as I understand, the prior probabilities fed into the computational model were based on human similarity judgments. This is quite problematic, as similarity can reasonably be seen as a dual of categories (with being-similar corresponding to being-in-the-same-category). So if I've gotten this right, then the answer is to some extent already built into the question.

Variations

A number of tweaks are further applied to the model:

The priors of the "basic-level" concepts (dog, car, and vegetable) can be manually increased to introduce a bias towards this level. This increases the fit immensely.
The priors of groups with high internal similarity (relative to the nearest neighbor) can be increased to introduce a bias towards coherent and separated categories. Tenenbaum and Xu call this the "size principle."
Applying the learned posteriors, the learner can either use a weighted average of probabilities, using the model posteriors as weights, or simply pick the most likely model and forget about the rest. The latter corresponds to crisp rule-learning, and it gives suboptimal results in the one-example cases.

I still have some methodological problems with the idea of a "basic level" in out conceptual system. Here as elsewhere, I find it question-begging to assume a bias towards this level of categorization.

Questions

I wonder how the model could be changed so as to

not have concept learning rely on preexisting similarity judgments;
take into account that similarity judgments vary with context.

Imagine a model that picked the dimensions of difference that were most likely to matter given a finite set of examples. Dimension of difference are hierarchically ordered (e.g., European > Western European > Scandinavian), so it seems likely that something like the size principle could govern this learning method.

Thursday, November 24, 2011

Ortony, Vondruska, Foss, and Jones: "Salience, Similies, and the Asymmetry of Similarity" (1985)

A paper from Journal of Memory and Language 24(5) most notable for its long list of good examples of similies (reproduced in the appendix).

Medin and Ortony: "Psychological essentialism" (1989)

This is a contribution by Douglas Medin and Andrew Ortony to the volume Similarity and Analogical Reasoning (1989) edited by Stella Vosniadou and Ortony. Medin and Ortony argue that entities have two distinct sets of features, a shallow and a deep set, that influence similarity judgments in different contexts.

Superficial Attributes and "Essence Slots"
The paper is called "Psychological essentialism" because Medin and Ortony thinks that the context-dependence of similarity judgment can be explained by supposing that ordinary lay folk are metaphysical essentialists. What this means is that the average test subject believes that entities have deep, perhaps unknown, properties in addition to their superficial attributes.

This is supposed to explain why we categorize dolphins with bats and not with sharks. In other cases, like an airplane, a subject's representation of this "essence" make take the form of a theory like "I don't know, but an experts could tell me."

There is something interesting and original about taking a bad philosophical theory and trying to explain it as a psychological phenomenon. However, think their theory hides as much as it shows, as it takes similarity judgment to be divorced from action. I think most of the paradoxical features of similarity judgments (as described by Lawrence Barsalou) would evaporate if we took "being similar" as explained by "treated similarly" rather than the other way around.

A Note on Gender and Style
When I was reading the paper, I was noticing that Medin and Ortony tend to refer to Linda E. Smith by both her first and last name, while they refer to male authors by their last name only. In order to check whether this was actually true, I counted how many times people were mentioned in the text, and whether their first names were mentioned:

Person	+ first name	– first name
Linda B. Smith	4	2
Lawrence W. Barsalou	1	9
Lance J. Rips	1	12
Ryszard Michalski	2	3
Edward E. Smith	2	8
Daniel N. Osherson	2	8
Ludwig Wittgenstein	0	2
John Locke	0	4

The numbers in the table show how often the respective authors occur with their first name spelled out completely. Last names mentioned in references such as "Smith and Medin (1981)" are not counted, since their form is dictated by more rigid style guides. I have not counted names that only occur once (which exlude Elanor Rosch, who is mentioned with first name).

Note that Linda B. Smith is the only woman referred to in the paper. The fact that her first name is mentioned more frequently than, say, Michalski's, may be attributed to the fact that there is another "Smith" frequently referenced in the paper, but note that he is more frequently referred to by his last name only. However, this may partly be due to the fact that his name is disambiguated by the fact that it occurs next to Osherson's as well as more often. A recent mention of a figure does to some extent make the first name obsolete.

Nevertheless, the numbers are quite striking. It would be interesting to do a more thorough investigation of this phenomenon. It would perhaps also be more interesting to investigate whether there is a significant difference in the distance from last mention that warrants reiterating a first name for men and women, respectively.

Subscribe to: Posts ( Atom )