Tuesday, June 12, 2012

Tenenbaum and Xu: "Word Learning as Bayesian Inference" (2007)

Joshua Tenenbaum and Fei Xu report some experimental findings with concept learning and simulate them in a computational model based on Bayesian inference. It refers to a 1999 paper by Tenenbaum for mathematical background.

Elements of the Model

The idea behind the model is that the idealized learner picks a hypothesis (a concept extension, a set of objects) based on a finite set of examples. In their experiments, the training sets always consist of either one or three examples. There are 45 objects in the "world" in which the learner lives: some vegetables, some cars, and some dogs.

As far as I understand, the prior probabilities fed into the computational model were based on human similarity judgments. This is quite problematic, as similarity can reasonably be seen as a dual of categories (with being-similar corresponding to being-in-the-same-category). So if I've gotten this right, then the answer is to some extent already built into the question.


A number of tweaks are further applied to the model:
  • The priors of the "basic-level" concepts (dog, car, and vegetable) can be manually increased to introduce a bias towards this level. This increases the fit immensely.
  • The priors of groups with high internal similarity (relative to the nearest neighbor) can be increased to introduce a bias towards coherent and separated categories. Tenenbaum and Xu call this the "size principle."
  • Applying the learned posteriors, the learner can either use a weighted average of probabilities, using the model posteriors as weights, or simply pick the most likely model and forget about the rest. The latter corresponds to crisp rule-learning, and it gives suboptimal results in the one-example cases.
I still have some methodological problems with the idea of a "basic level" in out conceptual system. Here as elsewhere, I find it question-begging to assume a bias towards this level of categorization.


I wonder how the model could be changed so as to
  • not have concept learning rely on preexisting similarity judgments;
  • take into account that similarity judgments vary with context.
Imagine a model that picked the dimensions of difference that were most likely to matter given a finite set of examples. Dimension of difference are hierarchically ordered (e.g., European > Western European > Scandinavian), so it seems likely that something like the size principle could govern this learning method.

No comments :

Post a Comment