Notebooks on Language: Tversky: "Features of Similarity" (1977)

Tuesday, June 12, 2012

Tversky: "Features of Similarity" (1977)

Tversky argues that objects should be represented as feature bundles, and that the similarity of the feature bundles equals the measure of their overlap minus the measures of the two disjoint parts.

He provides large amounts of evidence that this more versatile (and vague) scheme is a necessary corrective to the metric conception of similarity.

Finding the Relevant Dimensions

The general idea is "feature matching," a pragmatic process relying on background notion of relevance:

When faced with a particular task (e.g., identification or similarity assessment) we extract and compile from our data base a limited list of relevant features on the basis of which we perform the required task. (p. 329)

This process can violate metric properties because the basis for the similarity may be different in different cases:

Jamaica is similar to Cuba (because of geographical proximity); Cuba is similar to Russia (because of their political affinity); but Jamaica and Russia are not similar at all. (p. 329)

This sounds like Wittgenstein, and in the last section of the paper, he does in fact get a citation, during a discussion of Elanor Rosch's work (p. 348).

Symmetry and Reversibility

Symmetry, too, is problematic. Some prototypical examples of certain categories seem to make the central features of the category shine brightly and thus attract attention; this produces higher similarity judgements:

We tend to select the more salient stimulus, or the prototype, as a referent, and the less salient stimulus, or the variant, as a subject. We say "the portrait resembles the person" rather than "the person resembles the portrait." We say "the son resembles the father" rather than "the father resembles the son." (p. 328)

However, in certain cases, both objects may have the stereotypical character of a paradigm case:

Sometimes both directions are used but they carry different meanings. "A man is like a tree" implies that man has roots; "a tree is like a man" implies that the tree has a life history. "Life is like a play" says that people play roles. "A play is like life" says that a play can capture the essential elements of human life. (p. 328)

Whether or not Tversky selects the right features here is doubtful. But his point about feature selection is in general true, I suppose.

Context-Dependence

A large part of Tversky's paper is dedicated to compiling evidence against the symmetry of similarity judgments, and to showing prototype effects. This part of the paper is slightly dated, especially since he does not reprint any of his data, only the test statistics.

However, his examples of context-dependent similarity (pp. 340-344) are more interesting from a contemporary perspective. These include for instance the experiments in which he asked subjects to split a set of four objects into two pairs. This indirectly pointed to the context-sensitivity of feature selection.

One way he did this was by asking people to pick the a cartoon drawing of a face according to similarity. So his subjects would get a neutral face and a set of three frowning or smiling faces with an instruction to pick the face most similar to the neutral one:

As the numbers indicate, the members of the reference set mattered hugely for the judgment of the leftmost and rightmost face, even though these were held constant across the two conditions.

This seems to suggest that merely having two frowney or smiley faces in the reference set implicitly tells the subjects frowns or smiles are essential, stables features, rather than facial expressions drawn from a random distribution. A neutral face will then have a much smaller likelihood of coming from that "category."

The other "category," however, only contains a single example and thus yields higher likelihood levels, as it suggests that more variance within the category might have been possible.

Reformulations

If we set <a,b,c> = <neutral, frown, smile> and <d,e> = <dot-eye, circle-eye>, then the data set can be rephrased as follows:

Which of the following three pairs is most similar to <a,d>?
Condition 1: <b,d>, <c,e>, or <c,d>?
Condition 2: <b,d>, <b,e>, or <c,d>?

Notice that we get a symmetry here: If we swap the names b and c and rearrange the items, the two sets turn out in fact to be the same. Yet, we don't see symmetric choices.

I wonder how abstractly this prompt could be presented to a subject and produce results like those Tversky got. Imagine for instance the following formulation:

Which of the following is most similar to a white mouse?
Condition 1: a black mouse, a brown rat, or a brown mouse?
Condition 2: a black mouse, a black rat, or a brown mouse?

Whatever the answer is, there is a problem with this design, as it puts too much weight on forced partitioning of the four faces. A better method is used in the experiment reported on page 344. This can essentially be thought of as the following three conditions:

Condition 1:
How similar is Chile to Venezuela?
How similar is Guatemala to Uruguay?
(etc.)

Condition 2:
How similar is Sweden to Norway?
How similar is Finland to Denmark?
(etc.)

Condition 3:
How similar is Chile to Venezuela?
How similar is Sweden to Norway?
(etc.)

With this set-up, Tversky reports to have found a higher average similarity in (what corresponds to) condition 3. This is explained by the fact that the context foregrounds the geographical region as a cue in that case, but not in the two others.

Notebooks on Language