(* with apologies to Oscar Wilde)
Replication is never far from the news in psychology these days. Pick up a newspaper or journal, or browse the blogosphere, and chances are you’ll encounter yet another piece on the importance of repeating experiments. Like those people who dress up in period costume and reenact old battles, researchers are battening down and busying themselves repeating each and every tedious behavioral experiment from the past 100 years or so. Just to make sure. Meanwhile, the bright young things are immersing themselves in metascience, writing treatises on statistics and worthy lists of good research practices that will probably, eventually help us figure out how minds and brains work… one day, when someone actually gets around to looking.
Were Onan alive today, he’d be a replicator. Or second author on a methods paper, anyway. Which is not so say that I think replication is a bad idea in itself (as I made clear in an earlier post, I’m all for it). It’s rather that I feel a sense of mounting horror as I realize that most of the things I think of as ‘standard scientific practice,’ seem to be news to many of my colleagues. And I feel queasy as I contemplate what seems to be the replication movement’s big idea, which is that somehow a few new statistical techniques and some registered replications will suffice to fix the many parts of psychology and neuroscience that are misguided, doltish, or simply wrong.
In fact, to the extent that I occasionally find myself feeling some support for the replication movement, it is because of this last point. In the same way that every minute that the managing committee of a soviet car factory spent debating their workers’ political education saved many a poor oppressed people from having to actually travel in lethally incompetent cars, so much of the output of the replication movement is, thankfully, irrelevant. Usually, at least, it avoids being actively intellectually harmful.
The problem, as I see it, is the bottom-line: There is an expectation that those of us that draw a salary for pursuing research in the brain and cognitive sciences will actually try to contribute to advancing research in the brain and cognitive sciences; that the people who are paying us to pipe would prefer some tunes to more methods papers. They want us to pump out as many cars as we can, and they would like them not to be deathtraps, if we can possibly avoid it.
We are promising this:
yet delivering this:
Which raises a question: how can I convince you that I’m right about this problem without sounding like a soviet car committee pondering a new five year plan?
I’m going to endeavor to do so by sticking to the practical — and using some simple quantitative analyses to home in on the truth about that bête noir of the replication movement, social priming. You know, those splashy findings that so often feature in the headlines and/or Malcom Gladwell potboilers. The kind that suggest that obliquely showing people the US flag will make them yearn for popcorn two weeks later. (Or something like that, anyway.)
What I’m going to do is to show you why we ought to expect that many of the priming effects that people study in psychology will fail to replicate, regardless of the accuracy (or otherwise) of the original finding. To do so, I’m going to analyze a famous example of this type of research, and then I’m going to shine that same analytic light on what is supposed to be an inspiring example of the work of the replication movement. A replication that did more than any other to light the touch-paper to all the controversy in the first place.
At the risk of giving the game away, the science behind this famous “replication” looks uncomfortably like this:
The (not so) subtle processes of priming
People do seem to be fascinated by priming effects. And why not? If randomly showing me a flag or somesuch is going to change the way that I eat, or walk, or vote, I think I’d like to know about it.
The problem is that the effects of random flag (or whatever) showing have begun to appear elusive. It seems that these studies only do their thing when a few particular researchers are behind the wheel.
Why is that?
Here’s one suggestion, from Daniel Kahneman:
“priming effects are subtle and […] their design requires high-level skills. I am skeptical about replications by investigators new to priming research, who may not be attuned to the subtlety of the conditions under which priming effects are observed, or to the ease with which these effects can be undermined”
Which raises some questions: How subtle are priming effects, really? And how skilled do you have to be to make them work?
To try to answer them, let’s delve into perhaps the most notorious priming study of them all: the famous reading-elderly-words-makes-you-walk-slowly study. In this experiment, a bunch of undergrads were invited into social psychologist John Bargh’s lab, and asked to rearrange some scrambled sentences to try to make them sensible. After this, the researchers surreptitiously measured the speed at which the same undergrads wandered off down a corridor after leaving the lab.
Bargh and his colleagues found that when the scrambled sentences contained a set of words chosen to prime “elderly stereotypes,” the undergrads walked down that corridor significantly more slowly than when the sentences contained other, unrelated words.
Exactly how social priming is supposed to effect this kind of behavioral change is, for the moment, not entirely clear. However, as the name “priming” implies, the general idea is that it works by biasing our mind’s expectations: If we see the word doctor, we will be primed to expect and subsequently recognize the word nurse faster than if we first see commissar. This is not a mysterious process; it simply relies on the fact that experience will have taught us that when we’re talking about doctors, nurses are more likely to later crop up in conversation than when commissars are the topic of discussion; and the fact that our minds make use of this kind of experience.
In the reading-elderly-words-makes-you-walk-slowly study, the assumption is that the words subjects read primed them to think about elderly people (or some abstraction thereof) and that this in turn affected their walking speed (somehow). The precise mechanism by which this is supposed to occur has not been specified in any kind of mechanistic or mathematical detail, which is one of the reasons why these findings are controversial. However, it is abundantly clear that learning can alter our expectations and subsequent behavior in ways that are beyond our immediate ken (no one consciously chooses to recognize nurse faster given doctor rather than commissar). And so given that the focus here is on whether replication can help us establish the scientific facts about this particular matter of priming, I’m not going to worry too much about that aspect of the controversy.
Rather, I’m going to focus our attention on the fact that the primes in this study were words. Technology has, over the past few decades, made many of the statistical properties of words relatively easy to quantify and—as I showed in my last post—these properties can be used to predict priming effects with quite astonishing accuracy. Which means that we’re in luck when it comes to analyzing precisely how subjects’ expectations might have been shaped by the words in the original study, and uncovering whether these words can be expected to have the same effect in subsequent attempts to replicate it. You see, regardless of how we might feel about the original finding, if it turns out that we can’t expect these words to have the same effect in later replication attempts, then failing to replicate the effect now will tell us nothing about whether that finding was plausible or sensible (or not) then.
To jump start this analysis of why old-words-might-or-might-not-make-you-walk-slow, let’s take a look at this particular set of words, and see what their priming properties might be.
First, here are the words:
And here’s a plot of how often each word appears—i.e., its average frequency, sorted from most to least frequent—in the 2000–2010 sample of the Corpus Of Historical American English (COHA), a 450 million-word sample of American English text and speech.
As is clear from the graph, the frequency with which the individual prime words appear in English has a very skewed distribution… Which is not surprising. Ever since the seminal work of one GK Zipf in the 1930s, linguists have been aware that language distributions tend to look like this pretty much any way you slice them. In fact, in honor of Zipf, this characteristic shape has come to be known as a Zipfian distribution.
Given that the prime words in Bargh’s reading-elderly-words-makes-you-walk-slowly study show this more general pattern, it follows that whereas one of Bargh’s prime words gets used (and is hence experienced) a heck of a lot in English, and another is used a moderate amount, most of these words are actually used and experienced far, far less commonly in comparison. In fact, the average English speaker’s experience of the most frequent word – the one on the extreme left of the plot – is equal to that of all of the other prime words put together.
It goes without saying, of course, that even if we accept that prime words do alter people’s expectations (however subtly), they can still only do so if people have actually gained sufficient experience of them to actually form some expectations in the first place. And what’s more, since priming experiments measure average behavior across groups of individuals, priming effects will only be measurable for words that are experienced consistently across individuals, such that groups of people can be expected to form the same (or similar) expectations about them.
So before I tell you what that one super-common word is, I need to describe another property of linguistic distributions in relation to this: Words and sequences of words exhibit a statistical property called “burstiness.” (I know. Stop sniggering.) To understand what this means, consider that if we were to choose 100 books at random, then the rate at which common words like and and the occurred would be fairly consistent across each book. However, if we were to look for rarer words, like corpus or n-gram, it’s likely that we wouldn’t find either of them in any our 100 books. And when we did find books (or blogs) that contained n-gram or corpus—like this one—then we would encounter these words far more often than would be expected from their average rate of occurrence across all books (and blogs). This is what burstiness is. It reflects the way language is used, and it means that thinking about the “frequencies” of words is far trickier than researchers often tend to think it is.
It also brings us back to the plot above. You see, it turns out that the word whose average frequency is equal to that of all of the other Bargh et al prime words combined is the word… old.
To return to Bargh’s experiment, if we assume that the elderly-words-prime-slow-walking effect relies on subjects’ prior experience of the words that they were primed with—and there are lots of good reasons to assume this—then it follows that the one word that everyone in this experiment would have experienced far more often than any other was… old. And it also follows from the burstiness of language that the one word in the prime set that everyone would have had the most consistent experience of was also… old.
On the other hand, when it comes to the other prime words, such as Florida, or Knits or Bingo, the facts of language tell us that on average, each person will have far less experience of using, reading and hearing these words, and that individual experiences will be far more varied. Which means that, given that the effect reported by Bargh was an average effect—the primed undergrads walked slower on average—it seems that the word that was doing most of the priming work in this study was… old.
Which means we can now draw a first conclusion from this analysis. To describe the elderly-words-make-you-walk-slow experiment like this:
“participants were primed (in the course of an ostensible language test) either with words related to the stereotype of the elderly (e.g., Florida, sentimental, wrinkle) or with words unrelated to the stereotype. As predicted, participants primed with the elderly-related material subsequently behaved in line with the stereotype—specifically, they walked more slowly down the hallway after leaving the experiment.”
is to paint an inaccurate picture of what really went on.
The not-so-subtle truth of the matter is more like this:
So, why might priming undergrads with the word old make them walk slower? How might this serve to evoke ‘the elderly stereotype’ in them?
Well, one reason might be that historically, certain old phrases (or word n-grams)—like old man, old woman, and old folks—have been used far more than any others in English. In particular, old is a word that we love to use to describe people. In every decade since 1810, the noun most likely to follow old in American English is man, with woman next in line (lady is also, consistently, a likely occurrence given old). And, of course, the nouns that follow old are Zipf distributed. Which means that, historically, man and woman were really likely to follow old. (In total, around 60% of the nouns that follow old in American English are people-related.)
Here’s a thing though: The way people use languages often changes as the societies they live in change. And, when it comes to older people, it is clear that our society has changed and is changing. A lot.
This next plot shows the proportion of elderly adults in the US population over time, and as you can see, it’s growing.
The next plot shows the total population (in millions) of adults aged 65–84 and 85+ in the US population from 1970–2010. As you can see, not only has the proportion of old folks in the population increased, but many more Americans are living to a ripe old age than ever before. And their numbers are increasing:
What does the number of old people in the population have to do with a replication of Bargh’s experiment?
Well, the original elderly-words-make-you-walk-slow study was run in 1991. Which means that if we were to replicate this experiment using a class of today’s undergrads, most of them wouldn’t have even been born when those original students walked down that corridor.
Whereas the original undergrads Bargh studied learned their American English in the period between 1970 and 1991, today’s undergrads learned it in the period roughly between 1994 and 2015, such that if we replicate the original study on students today, we will be comparing two groups of students who learned English at entirely different times. And since those demographic plots we just saw give us some reasons to suspect that we might just have changed the way we talk about old people over this time, it’s possible that this might just matter.
So have we really changed the way we talk about old people? Here’s a first stab at answering this question. It’s a plot of the logarithmically transformed corpus frequencies** with which the word old co-occurred with a bunch of words used to label people in the periods between 1970–1989, and 1990–2009 (in COHA):
** Log transforms have the happy property of squishing frequency differences, taking some of the skew out of Zipfian distributions for the purposes of visualization and analysis. Yet even though this graph makes all the real differences much smaller, you can still see that when it comes to using old to describe men and women, the trend is down.
Next up, here’s a plot of the total frequency with which people-related words followed old decade by decade in the corpus in this period:
Again, while Americans used old to describe other people a lot in the 1970s, they have done so less and less as time has gone by. In fact, as compared to the 70s, this rate of usage is down by nearly a third today.
And, of course, because the changes in English usage we have just seen make it clear that encounters with the word old will now lead to people developing different expectations than they would have done in 1991, this means that when it comes to priming people with the word old, we shouldn’t be surprised if a priming effect found in 1991 ‘fails’ to replicate today. Old now primes different expectations than it once did.
Wait, wait. What about the other words?
Now, at this point, you might object that by focusing on old, I’m ignoring all the other prime words. All that distributional stuff and burstiness aside, you might still argue that somehow those other primes may still have had an impact on Bargh’s findings, albeit a more subtle one—through indirect priming, or spreading activation, or some other similarly opaque process.
Maybe. Unfortunately, figuring out the priming potential of the full set of Bargh words is far less straightforward than it is for old. Most of the other prime words—such as ancient and grey—do not occur alongside words relating to people in quite the same way that old does (indeed, one of the Bargh prime words—helpless—is far more likely to appear alongside child and children). Still, to get an estimate of the degree to which higher-order relationships between the other prime words might be expected to prime the “elderly concept” over time, I analyzed the relationship between the full set of primes found in the original study and the word elderly. For this, I used Latent Semantic Analysis (LSA), a measure of contextual similarity derived from large stores of text, which has been found to be a good predictor of priming effects and semantic similarity judgments.
Here’s the average similarity across each of the original prime words and the word elderly in the earlier and later part of this period:
Clearly, it too has declined.
To see if there was another way of wringing something out of the full set of prime words, I then examined the sets of the most frequent words primed by each of the prime words in each of the four decades from 1970 to 2009, comparing the LSA similarities between these ‘second-order’ primes and a set of elderly words (senior, retired, elder, aged, old and elderly).
Oh. And, given that for Bargh’s effect to replicate, these words had better be priming elderly rather than youthful stereotypes, I also looked at the relationship between the second-order primes and some words that are more closely associated with youthful stereotypes (namely, junior, student, juvenile, young, youngster and youth).
Here’s what I found:
Not only has the extent to which these words might be expected to prime old decreased, but they have—by this measure at least—become increasingly likely to prime young.
Cultural change and “conceptual replication”
These analyses make clear that the expectations that Bargh’s prime-words evoke will have changed over time. Indeed, they offer solid evidence to suggest that these words will no longer prime the way they once did. However, you might reasonably object that this is because people now use other words to elicit “the elderly stereotype,” and I’m guilty of ignoring these new words. Perhaps my analyses simply reflect the fact that people have changed the words they use to talk about old people. If only we could identify that new set of words, then we could come up with a design that would recreate the priming capacities of the original.
To explore this possibility, I analyzed the relative frequencies of the following set of “old …” n-grams:
old man, old woman, old men, old women, old person, old people, old folks, old age, old lady, old ladies
in the four decades from 1970 to 2009, and compared them to those of a larger set of semantic near-equivalents:
retired, retiree, seniors, senior citizens, senior citizen, elderly, aged, pensioner, pensioners, older ladies, older lady, older adults, older people, older person, older man, older men, older women older woman
Here’s what I found:
As you can see, it turns out that it is the case that these alternative ways of talking about old people have increased in frequency as the traditional old… n-grams have decreased.
However, they haven’t increased by that much. And in fact, this next plot—in which I combine the corpus counts for all of the words above—shows quite clearly that their increased frequency falls a long way short of making up for the decline of old…:
Here’s a fun oddity: The decline in the plot above is almost exactly equivalent to what would have happened if we had eradicated the word elderly from the English language across this period, while keeping everything else the same. And, of course, it was the stereotype of elderly that the original elderly-words-make-you-walk-slow study aimed to prime.
All of which is to say, that despite—or perhaps even because of—the fact that there are more old people in society than ever before, American English has changed: We talk about old people less than we did before. Which, if you think about it, needn’t necessarily come as a surprise: the more older men there are in a room, the less informative the phrase “old man over there” will be about any individual older guy…
An hour in the library is worth a month in the lab
Taken together, these analyses provide objective reasons why, whatever the facts of the original study, it should come as no surprise if it fails to replicate today. Indeed, if the point of replication is to examine the likelihood that an effect is real, then given that changes to the language have made any failure to replicate effectively uninterpretable, trying to replicate the reading-elderly-words-makes-you-walk-slowly experiment in 2015 is about as sensible a use of time as setting sail for Madagascar in order to confirm the historical reality of the dodo by capturing a live specimen.
Yet, as I mentioned earlier, instead of provoking a yawn and a “so what?”, failed replications of priming studies continue to generate headlines and feverish debate. One might be forgiven for thinking that all that heat must surely generate a little light. Yet sadly, this is far from the case.
To illustrate why the replication debate contributes more heat than light, let’s consider a particularly notorious failed replication of the elderly-words-make-you-walk-slow study. Its publication generated a considerable amount of controversy: First, because it coincided with number of data faking scandals in the priming literature; And second, because the authors’ suggested that Bargh’s original finding was the result of a clever Hans illusion. This prompted a series of vigorous replies, attracting the attention of a wider scientific audience. (As of July 2015, the paper reporting this replication failure has been cited around 200 times.)
However, this much-discussed and analyzed “failed replication” made at least one significant change to Bargh et al’s methods that—perhaps revealingly—has gone unnoticed, in spite of all the ruckus: the study was run on French-speaking subjects and the “replication” made use of a set of French-language primes. And while this might seem like an irrelevant detail, it is not.
To understand why, it is important to bear two things in mind. First, despite what many people—and apparently some researchers—seem to think, when languages like French and English differ, they do so in many more ways than simply using different sounds for the same words, such as saying the instead of le (or is that la?). And second, historically, most influential social priming studies have been conducted in English. And most of these studies have used lots and lots of English adjectives in their sets of prime words
This second point is important because the frequency and the distribution of adjectives varies considerably across languages, and adjectives can play very different functional roles in discourse depending on the language in question. Indeed, in recent years, some colleagues and I have been examining how, as English has evolved, it has come to make significantly more use of adjectives relative to similar, neighboring languages, which rely on gender class to help make nouns more predictable in speech. These findings indicate that the expectations that speakers will develop from their experience of English adjectives will differ from those speakers will develop from their experience of French adjectives. And this predicts that English and French adjectives will have very different priming properties.
To examine what this means for the idea of cross-linguistic replications of priming experiments, I first looked at the patterns of adjective use in two big corpus samples (over a billion words each) of English and French. The plot below shows the relative frequencies of adjectives for the two languages.
As can be seen, English adjectives are much more frequent than French adjectives, especially at the upper end of the distribution (revealed in the gap between the blue and the red line at the left of the plot).
This difference is also evident in the sets of prime words used in Bargh et al’s original study and Doyen et al’s French “replication”: The words in the Bargh set had an average frequency of 30 instances per million words (pmw) in English, which reduces to only 8 pmw in their French translations and equivalents. This means that it is highly unlikely that the French items evoke expectations with the same strength as their English “equivalents.”
Next up, to look for possible differences in how these primes shape expectations of upcoming words, I looked at where they occurred in relation to nouns across the two languages. To do this, I took contextual co-occurrence counts from a moving window of 5 words (i.e., one in which the target noun occupies the center of the window, and counts are taken from the 2 words preceding and following it).
The plot below shows how often nouns occur after all of the 21 mutually translatable items from the original English study and its French replication:
Apart from one adjective (ancient), English speakers get far more experience of these prime words in situations that will shape their expectations about nouns than French speakers do. And the extent to which such experiences will shape English speakers’ expectations is massively different—and massively more—than is the case for their French counterparts. (In the two samples, English items occur prior to a noun on average 18 times per million words (pmw), whereas for French items this drops to just 3 pmw).
These comparisons provides us with good reasons to suspect that—whatever the facts about the original Bargh study—we ought not to expect it to replicate in French. What’s more these analyses offer a template for what to do when other English priming studies fail to replicate in translation. While number crunching might not be as much fun as the fire, brimstone, and controversy of the current debate, comparing the quantifiable properties of the social and linguistic primes used in an original study and its translation can help us understand why these failures happen. And why failures to replicate effects in translation—like the present one—will often tell us nothing about the merits of the original finding.
(At this point, it’s worth recalling that many of the fraud scandals that have plagued social priming research involve researchers who conduct their work in gendered languages. Has the different priming potential of English as compared to languages such as French, and the fact that social priming researchers appear oblivious to these massive differences, played a part in leading researchers astray?)
What can we learn from social priming studies?
In an earlier post, I looked at the way words prime one another in Paired Associate Learning, and described how these effects change across the lifespan, as they are subject to the constant and predictable effects of learning. Those analyses provide strong evidence to support the idea that priming effects are not even likely to directly replicate across the lifespan of an individual.
When taken together with the analyses described above, this suggests that:
- specific social priming effects are unlikely to replicate across time
- specific social priming effects are unlikely to replicate across languages
- specific social priming effects are unlikely to replicate between age groups
Or, to put it another way, when we look at the properties of the primes, and the way that people learn about them, it becomes clear that these findings are unlikely to generalize much beyond the here and now of the group of people being studied at any given time. Which raises a question. If all we can conclude from a study in which priming people at a given time A has an effect B is that priming exactly those people at exactly that time had exactly that effect (and no more), then what can we say we have actually learned?
Plato, in the Cratylus, pondered Heraclitus’ claim that everything is always in a state of flux: If everything is constantly changing, he asked, how can one ever know anything?
“how can that be a real thing which is never in the same state?… for at the moment that the observer approaches, then they become other … so that you cannot get any further in knowing their nature… Nor can we reasonably say … that there is knowledge at all … for knowledge too cannot continue to be knowledge unless continuing always to abide and exist. … but if that which knows and that which is known exist ever … then I do not think that they can resemble a process or flux… no man of sense will like to put himself or the education of his mind in the power of names: neither will he so far trust names or the givers of names as to be confident in any knowledge which condemns himself and other existences to an unhealthy state of unreality”
Philosophers have addressed this question by focusing on the verb to know. They spend their time pondering the nature of epistemic certainty, and what it might mean to actually know something. (Whatever, “actually know” means.)
The natural sciences have taken a different tack, one that has tended to give epistemology a wide berth. Science has focused instead on determining the reliability of theoretical conjectures about a universe that really does appear to be in flux. Rather than seeking to establish criteria for certainty, the natural sciences have sought to manage and minimize uncertainty.
Scientists (and, importantly, statisticians) tend to accept that all models and theories are wrong, to a degree. Being wrong is not, in itself, a bad thing. Rather, scientific understanding has flourished – and flourishes – through an iterative process of establishing the circumstances in which models and theories succeed in predicting and explaining, and those in which they fail. Replication – the independent confirmation of results from one study in another – offers a scientific gold standard in this regard, because it helps to empirically delineate the phenomena that models and theories must account for.
Replicability is, however, ultimately a matter of chance. To put it bluntly, however so smart a researcher may or may not be, when it comes to modeling and explaining nature, in the end, all that matters is whether one has the good fortune to have alighted on a model that works. Or not.
Which means that experimental effects can only be expected to replicate if the natural phenomena that are being studied by scientists are sufficiently stable (in terms of time and context) to allow them to replicate. And it turns out that, empirically, Heraclitus’ flux is to some degree a fact of scientific life: Studies have shown that even materials as seemingly solid as metals have properties that are more akin to gases than our common intuitions are wont to suppose. The saving grace for a lot of science is that the structural aspects of most metals – along with much of the rest of the physical universe – are sufficiently invariant for most intents and purposes to ensure that exact replications of studies produce highly similar sets of results in most circumstances.
Our understanding of the biological world actively embraces the Heraclitean flux, and it isn’t just because biological changes happen at a rate that allow for replication. Our understanding of biology has grown beyond a mere catalog of events, and because of this, even when studies “fail to replicate” exactly on a trial by trial basis (because the pesky little organisms being studied have the audacity to evolve), the changing nature of results can actually serve to illuminate theory.
The scientific process that emerges from the replicability of biological results can be seen in the development of our understanding of learning. A reliable body of replicable results has established a set of principled empirical phenomena, and these in turn serve as a target for the development of models in order to predict and explain the empirical data. And while none of these models account for all the relevant empirical phenomena (all models are wrong in the limit), their commonalities – especially in the way that they use the discrepancies between expected and experienced outcomes to drive learning – has led to the identification of the brain structures that respond to these discrepancies.
Priming effects clearly have a biological component. They happen in people. And it is highly likely that they are a product of many of the same predictive brain processes involved in learning. However, as I have sought to show above, the efficacy of many, if not all, social priming effects relies on cultural and experiential factors that are clearly not invariant over time. And what this means is that the dynamics of learning and the dynamics of cultural change guarantee that perfectly valid social priming results cannot and should not be expected to replicate in the same way that valid results from physical and biological studies tend to do.
In other words, specific priming results cannot be expected to replicate because their social and cultural components are not invariant aspects of the natural world, and it is the relative invariance of the natural world that enables replication to be the cornerstone of cumulative natural science. Which means that the problem that priming research has to face is not that the field doesn’t replicate its findings often enough, but rather that, as they are currently employed, the approaches that some researchers in psychology have borrowed from the natural sciences may not be appropriate to the phenomena that they study at the level that they actually study them.
This has a number of implications:
First, it means that whatever the motivations behind the replication movement, it looks like going out and conducting direct replications of psychological findings en masse may not be such a wonderful idea, simply because it is inevitable that the dynamics of change in social and cultural phenomena will necessarily cause many perfectly valid social priming findings to fail to replicate.
Second, and echoing this point, it seems clear that the experimental approaches that researchers in psychology apply to these phenomena may not be appropriate to what it is that they actually study. And that, if you want to understand why ‘elderly’-words may or may not make people walk along corridors more slowly or more quickly, the kind of methods illustrated here might be better suited to the task.
Finally, when it comes to understanding priming, specific demonstrations of priming phenomena are kind of beside the point. We now have a decent understanding of how and why it is that this or that kind of stimulus causes our minds to respond in various tasks based on our prior experience. And we are getting far better at modeling the environmental properties of systems of stimuli. The challenge is to actually integrate all the bits of stuff that we know, and to put it all together in explanatory terms. And to figure out the answers to some really important questions, such as how do learning, genes and culture conspire to produce a mind, and what really happens as those minds then age? (Which, as I’ve endeavored to show before, is clearly not what cognitive aging research would have you believe.)
This last task is a very difficult one. But it won’t be solved by trying to replicate the irreplicable. Or by methods papers or metascience. It will only be solved by better measures and models. And by better science and better thinking.
For anyone wanting to delve a little further into this, a paper containing more details of these analyses (and more analyses) is available here.
Bargh, J.A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: direct effects of trait construct and stereotype-activation on action. Journal of Personality and Social Psychology, 71 (2), 230-44 PMID: 8765481
Bargh, J., & Chartrand, T. (1999). The unbearable automaticity of being. American Psychologist, 54 (7), 462-479 DOI: 10.1037//0003-066X.54.7.462
Doyen, S., Klein, O., Pichon, C.L., & Cleeremans, A. (2012). Behavioral priming: it’s all in the mind, but whose mind? PloS One, 7 (1) PMID: 22279526
Forster, K., & Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10 (4), 680-698 DOI: 10.1037//0278-73188.8.131.520
Meyer, D.E., & Schvaneveldt, R.W. (1971). Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90 (2), 227-34 PMID: 5134329