The errors in my answer to Darwin

How the humble lab rat can teach you to raise your kids smarter and grow old gracefully

The importance of knowing how to look
Today, Charles Darwin is largely known for his theory of natural selection. Yet although his status as a scientific legend is assured, the nature of his fame does him an injustice. Darwin was not a theorist in the modern sense. First and foremost, he was a brilliant observer, and his extraordinary gift for observation is, more often than not, hugely underappreciated.

While the difficulty of observation is easily overlooked, it has become abundantly clear that we do not perceive the world ‘objectively’ (indeed, as Borges’ ingenious tale of the one-to-one map of the world reminds us, objectivity is a tricky idea at the best of times). Instead, our brains invent their perception of the world, inferring the nature of reality by means of a variety of processes based on of guessing and learning. For better or for worse we are theorists by nature, and the unreliability of perception is one of the reasons why we do science in the first place. It is also why observation is such a vital – and very difficult – scientific skill. Our facility to observe in an ‘objective-like’ fashion – consciously resisting the temptation to interpret every observation in accord with our minds’ prior expectations – is one of the great intellectual achievements of our species.

A prime example of Darwin’s genius for observation is found in his little-known contributions to the study of child psychology. Aside from his other well-documented achievements – and despite suffering from chronic illness throughout his adult life – Darwin not only fathered ten children, but even more remarkably, he found time to be a wonderful, engaged dad in an age when fathers were typically distant.

Darwin kept detailed diaries of the observations he made as his kids grew, and as a result of these we know that he may be the first person to ever notice a simple, yet deeply puzzling phenomenon that has tended to pass just about every other parent by: the bafflingly long time it takes kids to learn the meanings of color words.

The insight came to Darwin while using a playroom tapestry to play a naming game with a couple of his children. He had started out having them tell him the colors of the objects on the tapestry, and then, for some reason, he hit upon the idea of getting them to name the colors of individual threads as he pulled them from the cloth. To his amazement, Darwin discovered that once the scaffolding of familiar objects was removed, his kids we completely stumped:

I … was astonished to learn that soon after they reached the age where they knew the names of most common objects, they seemed to be entirely incapable of giving the right names to colors … I remember quite clearly saying they were color-blind

(Darwin, Biographiche skizze eines kleinen kindes, 1877 )

To put this observation in perspective, a few years ago I ran a study in which two-and-a-half-year-olds were asked to point out objects based on their colors. The kids were shown an array like this one, and one of my assistants asked them, “Can you show me the one that’s blue?” (or “the blue one?”):


Prior to this, we asked their parents — highly skilled workers from Silicon Valley — what they thought their kids knew about the meaning of red, yellow, and blue. Their responses were consistent: these parents were supremely confident that their toddlers were firmly in command of the meanings of seemingly trivial words like ‘red,’ ‘blue,’ and ‘yellow.’

So you can probably imagine their shock when, almost as often as not, little Johnny and Julie responded to our questions by simply pointing to a colored object at random.

The stunned surprise of these parents stands as a testament to the remarkable subtlety of Darwin’s observation. His children clearly knew the words ‘red,’ ‘blue’ and ‘yellow,’ and, like the kids in our experiment, they clearly knew something about their meanings. For example, if we asked our children what color a banana is, they would reply, confidently, “yellow.”

The problem, when it comes to figuring whether a child knows what ‘yellow’ means, is that even blind children know bananas are yellow. Or, that grey geese are grey. They can figure this out from language; And they can do so without ever grasping what ‘yellow’ and ‘grey’ also tend to mean to the rest of us.

Darwin had noticed something that passes most parents by: It takes a surprisingly long time for sighted children to actually grasp the meanings of yellow and grey. He had no problem working out that his kids weren’t really color-blind (and indeed, that their color discrimination abilities were fine), but he never did discover why they were so slow to learn the meanings of color words.

What the rat can tell us about how children learn the meaning of color words
So what is the problem? Why do kids find the apparently simple task of associating a color word such as ‘red’ with a particular part of the color spectrum: red? One clue can be found in my last post, where I described how dogs learn to associate bells and dog-food, and rats learn to associate tones and shocks. As I explained there, this kind of learning is not the simple process of tracking “associations” people imagine it to be. In fact, it turns out that the way animals learn “associations” is an evolutionary twist on Sherlock Holmes’ dictum: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth.” A rat does not simply learn to associate tones with shocks, but rather the association between the tone and the shock is what is left over after the rat’s brain has weighed up every other potential source of information around and found it wanting.

There are, of course, many differences between children and rats. However, there is every reason to suppose that evolution has equipped the human infant with discrimination learning capabilities that are at at the very least the equal of those of a rat (and, of course, every child comes equipped with a brain that has an information processing capacity that dwarfs that of a lab rat several times over). And it turns out that knowing how rats learn can help us explain why kids find color words so difficult to learn. In fact, understanding how rats learn can even help us help our kids overcome these difficulties.

In order to know that “red” means red and “yellow” means yellow, a child has to discriminate exactly which specific range of hues is informative about – or associated with – each color word. From a discrimination learning perspective, several significant obstacles to the child’s success soon become apparent:

First, a child will never encounter color independently. As the philosopher Wittgenstein noted, while little Julie will encounter green apples, or green grass, she almost never comes across green alone, decoupled from everything else.

The way that colors are distributed among things in the world poses further problems. The chances of a child hearing a color word while only one color is present are going to be pretty much zero:


If a child were to hear color words in isolation, the sheer variety of colors she would be exposed to at any given point in time would make figuring out what in the world was informative about “green” or “blue” a tall order.

Thankfully, however, if words are heard sequentially (as, in fact, they are), then once a child has learned the names of a few objects, then she will have gained a foothold into the tricky business of color. (As you may imagine, because objects are less ubiquitous than color, their names will be easier to learn: Dogs and cats are far less ubiquitous than colors – indeed, many children will tend to encounter either one or the other at any given time – and if the balance of positive and negative evidence favors a match between “doggy” and dogs and “kitty” and cats, our child will be in business).


Let’s suppose our child has learned doggy, and now she hears “Look! The doggy is brown,” in the presence of a brown dog. The next time she sees a dog, and hears, “the doggy is…” then based on what she learned in the scenario above, she will now expect to hear “brown”. (Important note: the child’s expectation is implicit. She won’t be consciously be thinking, “Oh, hello, I guess brown comes next”).


If this dog is also brown, and our child hears “brown”, then while this will strengthen the connection between dog and “brown”, it won’t help her learn what “brown” means. This is because this experience will also strengthen the records that connect every other aspect of the doggy and “brown” in her memory.

To discriminate what, exactly, “brown” means, our child is going to have to dissociate the evidence provided by the hue brown from all of the other doggy features. And, as we saw in the case of the rat, in order to learn about background rate information, her brain is going to have to make some errors.

Suppose the next dog she meets is white. Implicitly, when she hears, “the doggy is…” she will still be expecting to hear “brown.”


But this expectation is erroneous. Suppose (as is not completely unlikely), she hears:


In this case, implicit error will begin to play an important role in her brain’s word learning. Remember, our child has learned from experience that all the features of the brown dog should lead her to expect brown. Since she now hears “white,” this will result in an implicit devaluation of the records that connect all the other doggy features—the wet nose, waggy tail, fur, etc.— to “brown,” because these features, that are also part of the white dog, led her to erroneously suppose she would hear “brown.”

This process will cause value to shift from the features that produce lots of error to those that produce little error. Indeed, even at this stage, brown will have been implicitly strengthened as a cue to “brown” simply because the child’s records for all of the other dog features will have been devalued. This means that, despite the fact that she neither heard the word “brown” nor saw any brown, the child’s understanding of the relationship between brown and “brown” will have improved. Despite our intuitions about “associations,” a lot of the important learning takes place when the things that are actually learned about aren’t present: Thanks to her encounter with the white dog – and a helpful bit of language – our child will now be on the way to learning that ‘brown’ means brown.

This also helps explain why it is that, as Darwin first observed all those years ago, a child will learn the meaning of brown later than she learns the meanings of many other equally common words. In order to ever learn the meaning of her first color words, a child will first have to learn the names of some other things. And then she will have to make some errors.

These errors are my solution to Darwin’s puzzle.

So much for the story: Is it right?
Learning from error in the way I have described provides a method for a child’s brain to discriminate what is informative about color words (and, what is more, we can be fairly certain that the child’s brain contains circuits that learn in pretty much exactly the way I’ve described); And because the specific properties of color, and the distribution of colors in the world, means that children first need to learn about objects in order to learn from these errors, it also helps explain why children find learning color words so tricky.

Not only do the mechanisms of learning provide a solution to Darwin’s puzzle, they also offer a neat way of testing whether the solution is right. So far, when I’ve used color words in my examples, I’ve done so in a particular way. The color words have followed the names of colored objects (“the doggy is white;” “the doggy is brown”). Most of the time, however, English speakers don’t do this: They say, “the white doggy,” “the blue frog,” or “the green whatever.”

It turns out that the method for color word learning I just described works well when color features predict a discrete label (Feature to Label-learning; FL). Because learning is competitive, where multiple features act as cues to a single event, reliable cues (like brown to “brown”) can effectively act to eliminate the influence of other, less reliable cues.

However, if this sequence is reversed in time (i.e., “Look at the brown doggy!”), the learner is now presented with a discrete label that predicts various complex sets of features (Label to Feature-learning; LF).


The change in the situation here is subtle, but its impact is large. Things now look far more like they did when color words are heard in isolation. At the point the child hears “brown,” she has no information about what it is that is supposed to be brown.

Although the child might eventually learn that label “brown” is a better cue to “doggy” (and the doggy’s features) than many other things in the world, once she has figured out “brown,” no further competition between cues can occur in this situation. To all intents and purposes, LF-learning will offer the child only one useful cue: the label brown.

darwin10 A solitary cue – the label brown – can hardly compete with itself, and because of this, LF-learning does not produce any useful competition. The learning system will simply learn to balance the positive and negative evidence for the relationship between the label “brown” and every feature that tends to follow it… all of the benefits that cue competition brings will be lost.

Which brings me back to the experiment I described at the outset. My colleagues and I brought two-and-a-half year old American-English speaking children into our lab, and repeated Darwin’s test, asking them to point out novel objects that we labeled only by their colors.

The children were presented with pictures like the one below, and we asked them questions like “can you show me the red one?” and “can you show me the one that’s red?”


Confirming the veracity of Darwin’s observation, and much to their parents’ consternation, they did not cover themselves in glory.

The children then were presented with a “magic bucket,” which contained a six sets of objects that we expected the kids would know the names of, such as balls, bears, cups etc. There were three instances of each object, a red one, a yellow one, and a blue one, and one by one, my assistants presented the objects to the kids while labeling them.


Half of the children were presented with the objects and heard them labeled features to labels. They were presented with the ball, for example, and heard, “this ball is blue.” The other half were given labels to features training, hearing, “this is a blue ball,” and then being presented with the object.

This was repeated for each of the six sets of objects, after which we tested the children again. The children in the labels to features condition showed no sign of improvement (in fact, they did marginally worse). The children in the features to labels condition also showed no sign of improvement on the trials where we asked them to “show me the red one.” But, on the trials where we asked them to “show me the one that’s red,” this tiny amount of training was enough to bump their success rate up to 70% correct.

This does not, of course, mean that my solution to Darwin’s puzzle is “correct.” But it was very encouraging, and it led to another, interesting discovery. After we published our findings, I was often asked a question: “Does that mean that Spanish kids learn their colors faster?” (In Spanish color words tend to follow nouns, i.e, “Casa Blanca,” rather than, “White House.”

My answer was that I didn’t know, and without being able to do a detailed comparison of the actual frequencies at which English and Spanish kids hear color words in various positions, I wouldn’t want to hazard a guess. It felt pretty lame, so I decided to do something about it. Here are the frequencies of a set of color words in two very large samples of Spanish and English text and speech:


These numbers suggest that English speakers use these words more than twice as often as Spanish speakers (which would mean that my cautious answer might not have been so lame after all). Does this reflect a deep cultural difference? Is it perhaps a reflection of the colorful nature of English speakers, and the grey dourness of their Latin counterparts? Does this mean that Spanish children might have other problems learning color words?

Somehow I doubted it. In another study, where we used label-order to improve children’s “number sense,” I had looked at the frequency with which Spanish and English speakers use numbers to talk about sets of objects (two balls, five apples, etc.). On this measure there was not the slightest hint of any unforeseen cultural divide. The graph below plots the relative frequency with which the words one to seven precede nouns in the same samples of English and Spanish. As you can see, there is no support at all for the idea that Spanish and English speakers have more or less interest than each other when it comes to numbers:


So why the difference when it comes to color words? The answer – which I won’t dwell on here – more likely reflects the fact that Spanish is a gendered language, in which determiners like el and la provide speakers and listeners with information about upcoming nouns. Since English lacks gender, English speakers tend to make more use of adjectives to achieve this goal.

If one is at a friend’s house, and said friend wanders over to the refrigerator and says, “how about a nice cold…” chances are that this gives you some information about the next word, even though in fact, of course, everything in the fridge is cold. Similarly, when mom stops and points out something to junior, saying, “oh look at the cute little…” chances are she’s pointing to a puppy, or a kitten, or a baby. Since every baby, kitten and puppy ever born is, almost by definition, cute and little, mom isn’t really using these adjectives to describe the baby/kitten/puppy. She is using them to reduce Junior’s uncertainty about which word will come next… Which is, of course, what el and la serve to do in Spanish.

Put simply, color words are used differently in English and Spanish. And what’s more, the solution to Darwin’s puzzle suggests that all the extra “reds” and “blues” that English speaking children hear probably won’t be that helpful to them when it comes to learning the meaning of “red” and “blue.”

What we want to know, of course, is how often children will hear color words in positions that are informative about color: How often children can be expected to hear “the frog is / was green” etc., in English, or “la rana verde” in Spanish (in case you are wondering, I left brown out of this analysis because it is a very common U.S. surname, and I also avoided black and white because their uses and meanings are far less black and white than most colors):


It turns out that when it comes to the use of color words in the important way that helps learning, the two languages look much more alike. Which means that it also looks like we should expect that neither a Spanish- nor an English-speaking child is going to be at an advantage when it comes to mastering color words. Despite the huge differences in the frequencies of color words in their two languages, when it comes to encountering talk about color that is informative about the meaning of color words, Spanish-speaking and English-speaking children will get quite strikingly similar support from their languages.

That is, of course, as long as we interpret “informative” and “support” in the way that everything we know about the way our minds and brains learn, and the solution to Darwin’s puzzle I have offered, suggests we ought to.

What has this got to do with aging gracefully?
The learning mechanisms I have described in my last two posts are very different to what people normally think about when they hear the word ‘learning’. What is striking about these ideas, however, is how similar they are to a description of the way infants first experience the world that was offered by Darwin’s contemporary, William James:

“the undeniable fact being that any number of impressions, from any number of sensory sources, falling simultaneously on a mind which has not yet experienced them separately, will fuse into a single undivided object for that mind. The law is that all things fuse that can fuse, and nothing separates except what must… Although they separate easier if they come in through distinct nerves, yet distinct nerves are not an unconditional ground of their discrimination, as we shall presently see. The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion; and to the very end of life, our location of all things in one space is due to the fact that the original extents or bignesses of all the sensations which came to our notice at once, coalesced together into one and the same space. There is no other reason than this why “the hand I touch and see coincides spatially with the hand I immediately feel”

James, The Principles of Psychology (1890, p488).

Somewhere along the way, this perspective got lost, and it can be disheartening to compare James’ ideas to most contemporary psychologists’ notions of learning. In an earlier post, I described a study that found that older adults completed mental addition tasks much faster than younger adults, a finding that completely undermines the simple claim that all of our response times get slower as we get older.

My observation drew the following response from ageing researcher Patrick Rabbitt:

This twice misses any target. First, if general information clogging means that older people have difficulty with instructions on all tasks surely they should also become slower on mental arithmetic? Second, why am I supposed to be banjaxed by any finding that performance on any task improves with age ? (no nasty “decline” then!). I am not, because I can still do (some) mental arithmetic faster than my children or grand-children because, when I was little, ferocious nuns drilled me on multiplication tables. My undrilled offspring have the sluggish insouciance of people who have always had calculators to hand.

In other words, Rabbitt’s response to a finding that older adults are 30% faster on a task than younger adults is not puzzlement, or even any acknowledgement that a puzzle exists. The finding is simply explained away: Older adults are faster at mental addition (of course) because they rote learned multiplication tables. Because 60 year olds rote learned that 7 times 7 is 49, then — somehow — this magically makes them faster at adding 19 and 16.


Unlike most humans over many millennia [Ramscar does] not seem to have twigged that learning to do something new makes us better, rather than worse at doing other similar things. Perhaps [Ramscar] can be excused because psychologists only recently began to formally explore this phenomenon (about a hundred years ago). We call it “generalisation”. A brief Google search on this word will find scores of ingenious experiments illustrating that learning a new thing can make you faster, not slower, at doing other things.

Yet, to return to Darwin, the problem young children have with color words is “generalization.” Three year olds appear to be perfectly capable of generalizing the word “blue” to green and “green” to blue. What they lack is the ability to constrain these generalizations, so that “blue” is not generalized to green and “green” is not generalized to blue.

Generalization is the problem, not the solution. The answer is discrimination learning.

Moreover, as children learn to discriminate that “blue” goes with blue and “green” goes with green, they learn more. Indeed, the chances are that, in processing terms, things will get complicated: As children begin to discriminate the appropriate relationships between blue and “blue” and green and “green”, they will initially experience an increase in uncertainty as they reply to color questions, simply because they will become more aware of the conflict between “blue” as an appropriate response to questions about blue, and “green” as an inappropriate response in the same context. Indeed, over time, we might expect that this uncertainty will follow a ‘U-shaped’ function, first sharpening and then tapering off, as their learning of the appropriate discriminations improves.

What is also worth noting is that when spelled out in detail, the mechanism that appears to enable children to learn the meaning of color words appears to be the exact the opposite of the view of learning that Rabbitt describes. And while Rabbitt’s description of people’s beliefs about learning may well be true, the fact is that scientists who actually study learning firmly rejected this view over 40 years ago. Put simply, it doesn’t work.

What is most disheartening about all this is that, although one of the greatest learning researchers of the past century, Robert Rescorla, has dedicated a great deal of energy to documenting and trying to correct the hopelessly inaccurate teaching of learning theory in universities, most students are still trained in the kind of naïve folk theories that Rabbitt appeals to above. I suspect that this is in part because, in the face of the seemingly overwhelming challenge of “understanding the mind,” many researchers in the cognitive sciences seem have come to confuse healthy scientific skepticism with intellectual nihilism, and mistakenly believe that unless one has an explanation that accounts for everything, then all theories are equally valid, and equally open to question.

Unlike a lot of work in psychology, the answer to Darwin’s puzzle that I described above is an example of cumulative science. The theory I described wasn’t just plucked out of the air, but rather, it was worked out by carefully considering how the models that best explain learning in other domains might be applied to this particular problem. These models may be simplifications – as indeed, all scientific models are – but by any reasonable standard they represent by far the best way we have for formalizing and understanding learning (indeed, they have even helped us make huge strides in understanding the neural structures that implement the learning mechanisms in our brains that these models seek to characterize). It turns out that asking what these models can and can’t do given the particular constraints associated with learning about color words leads us to a fairly elegant account of the way it is that children’s minds actually come to learn the meanings of these words.

While this account may not be “right”, unlike most theories in the cognitive sciences, it makes falsifiable predictions that lead to surprising discoveries: prior to our doing this work, no-one had actually noticed how important the sequencing of words is to children’s learning in this situation. What is more, the theory even led to second-order predictions that help make sense of the differences we found in the distribution of color words in different languages.

The contrast with theories of cognitive decline — which are simply speculative descriptions of observed data — could hardly be greater. These theories are cumulative only in the sense that they all use the same tests, and they all misinterpret the results of these tests in the same way. And when it comes to the previous science that matters, researchers ignore everything that many years of cumulative science has uncovered about learning, and they fail to consider the way that learning processes must inevitably affect performance on cognitive tests across the lifespan.

Robert Rescorla’s survey of the dismal teaching of learning theory in universities casts an interesting light on why this might be: If researchers aren’t being trained properly, and if they don’t understand learning, or the way that learning to discriminate new information increases the complexity of cognitive processing (or, indeed, that information processing is more than a metaphor), then is it really any wonder that when these same researchers try to explain, say, changes in reaction times over the lifespan, these explanations end up being little more than codifications of age-old superstitions?

Returning for a moment to how children learn color words, we can ask: What impact will the learning process have on the speed at which color information is processed in a child’s mind? What will be the impact on speed of processing be if that process involves going from a set of undifferentiated colors and color words, to then discriminating which color word goes with which color, to gaining expertise with an increased range of informative responses.

To answer these questions we need a scientific understanding of learning that goes beyond vague appeals to mysterious “generalisation” processes that somehow enable the rote-learning of multiplication tables to boost people’s ability to do addition. Rather, we need scientific models of learning and information processing that allow us to formalise and better understand our answers. Models that actually provide explanations. In describing my solution to Darwin, I have tried to illustrate how when it comes to learning, we now have a great deal of that scientific understanding, and at least a rudimentary form of those models, and how we are able to offer explanations that actually look like explanations.

In my next post, I will use the exact same model that I used to explain color learning to provide a detailed explanation of the way learning impacts performance on a canonical cognitive measure across the lifespan. And once I have controlled for learning, I’ll consider whether the results that we see when this test is taken by adults of all ages actually offer any evidence of “decline” in healthy aging at all.

Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The Effects of Feature-Label-Order and Their Implications for Symbolic Learning Cognitive Science, 34 (6), 909-957 DOI: 10.1111/j.1551-6709.2009.01092.x