Sticks and stones (3): How names hurt

The shock of the old

Most  people in Iceland don’t have family names. Instead, Icelanders’ last names are made from their father or mother’s first name, to which males add the suffix -son (son) and females -dóttir (daughter). This practice  can seem strange to outsiders, but it was common throughout Scandinavia until surprisingly recently: laws compelling citizens to adopt heritable family names were only enacted in 1828 in Denmark, 1901 in Sweden, and 1922 in Norway.

In 1982, Sweden changed its laws to allow matronyms and patronyms to augment – or even replace – family names once again; and Norway (2002) and Denmark (2005) then followed suit. This legal flip-flopping is a reminder of the fact that heritable family surnames are a remarkably recent invention, and that their ubiquity in the modern world is not the result of cultural or linguistic evolution, but legislation. Heritable family names were regulated and enforced as  modern states developed because they make populations easier to count, tax and govern, and with this in mind, it is worth noting that for most of Scandinavian history, to the extent people had fixed names, these were first names. Surnames were used when it was necessary to discriminate one Erik from another, and historically, they were ad hoc and flexible. And although the practice of forming surnames from parents’ given names was common, basing them on a discriminating feature – the village or farm where someone lived, their occupation, a personal characteristic, etc. – was not exactly uncommon.

These historical realities are reflected in the distribution of Scandinavian family names. While Denmark, Norway, and Sweden are now awash with fossilized patronyms, many Icelanders use Western-style surnames that reflect the feature based naming practices that historically coexisted with patronyms in Iceland. Accordingly, if we want to understand why it is that the default surname for Einar Peterson’s son Björg is Einarson (a true patronym) in Iceland yet Peterson (a fossil) in Sweden, the answer lies not in ‘customs’ or ‘traditions,’ but in the arbitrary decisions of governments.

Where do family names come from?

Historical English naming practices were equally fluid and equally mixed, and this is also reflected in the fossil record. Although the 2000 US census recorded 6,248,415 unique ‘family’ names, 20 million Americans share just fifty: Johnson, Williams, Jones, Davis, Wilson, Anderson, Thomas, Jackson, Thompson, Harris, Lewis, Robinson, Adams, Nelson, Roberts, Phillips, Evans, Collins, Edwards, Morris, Rogers, Peterson, Richardson, Watson, Brooks, James, Hughes, Myers, Sanders, Jenkins, Barnes, Henderson, Simmons, Patterson, Jenkins, Barnes, Henderson, Simmons, Patterson, Reynolds, Graham, Hayes, Gibson, Ellis, Stevens, Owens, Harrison, Wells, Olson, Burns, Henry, and Simpson.

A further 17.5 million Americans share another fifty: Smith, Brown, Miller, Taylor, White, Clark, Walker, Hall, Young, King, Scott, Green, Baker, Hill, Carter, Turner, Parker, Cook, Cooper, Bailey, Bell, Ward, Cox, Wood, Gray, Long, Butler, Fisher, Coleman, West, Cole, Ford, Wells, Woods, Washington, Tucker, Freeman, Porter, Mason, Wagner, Hunter, Hunt, Black, Stone, Schmidt, Weaver, Gardner, Armstrong, Lane, or Carpenter. 

While it would be nice to say this shows how population is one big happy family, in truth none of these ‘family names’  trace back to a single, historical pater familias, because, of course, none of these common surnames are ‘family names,’ at all. They are the remnants of the earlier naming practices that states have brushed aside in their efforts to regulate how citizens name themselves. The first fifty testify to the fact that the English also used patronymics to discriminate the Annes from one another, and the last fossilize the distinguishing features – jobs, hair color, place of abode, etc. – that distinguished between Johns.

Although Icelandic patronymics might appear odd to modern Western eyes,  history shows that the real oddities are modern ‘family’ names,  because, as these 100 common names reveal, English surnames are not ‘family’ names at all. Rather, throughout most of  history, English first names were drawn from a very small pool – in any given community, around 50% of males shared 3 male first names, and 50% of females 3 female first names – and as in Scandinavia, and surnames were used flexibly to  distinguish between the large sets of people who shared  a common first name. They were not used to discriminate families  from one another, and in fact, there is a good reason why English naming practices didn’t evolve to produce ‘family’ names: because ‘family names’ don’t actually communicate meaningful information about families anyway.

 What’s in a family name?  

To help explain why, let’s consider my last name, Ramscar. It can be traced back a thousand years or so, to the appearance of one Adam Heggar de Romesekerre in the Domesday Book (the Norman inventory of conquested swag in England and Wales). If we assume a direct line of inheritance starting at Adam and ending with me (and given that the name was passed along the male line, this may be a stretch), we can calculate how much information “Ramscar” conveys about my extended family of ancestors.

At the parental level, because my mother took my father’s surname on marriage, Ramscar is 100% informative about my family. Yet, when we get to my siblings, two of whom are male, and two of whom are female (and have been married since I was a child), the family information communicated by Ramscar falls to 50%. It also 50% for my grandparents, and this drops  to 25% for my great grandparents. In this vein, we can ask, how much do I learn about my family – and my lineage – from the line back to Adam?

If we assume that people in my family reproduce every 25 years or so, then approximately 37 generations separate Adam and I. If we also assume that it takes a man and a woman to reproduce, such that the number of direct ancestors doubles in each generation, we can look at the tree that leads down from Adam’s generation to mine, and ask, how many branches, aside from his, have contributed to my family inheritance?


The answer?

Adam is one of 68,719,476,736 initial branches.

This is, I admit, a surprisingly large number. Especially since it turns out that at the time Adam de Romeskerre was alive, the population of England and Wales only amounted to  around 1,800,000 people.

The explanation for the seeming mismatch between these two numbers is to be found in Joseph Chang’s fabulous paper, Recent common ancestors of all present-day individuals:  in reality, these branches ‘mix.’ My 69 billion branch estimate doesn’t allow for the fact that, as we go back in time,  many of the branches in the actual tree mapping my line of descent from Adam converge, because many of the couples who got together leading to my birth had common ancestors along the way. (Mark Liberman, who pointed me to Chang’s work, has written about some of the fascinating implications of this on LanguageLog).

Accordingly, rather than the neat, ever branching tree I drew above, family inheritance links look more like this:


Chang (1999)

Mixing trims inheritance trees down. And since we know that the actual number of branches in  Adam’s time was around 1.8 million (many of whom have no living  descendants), and mathematically, the number of potential branches is around 69 billion, it is safe to assume that a lot of mixing occurred.

In fact, as a result of Chang and his colleagues’ careful research, I can say with confidence that that if anyone is descended from Adam de Romeskerre, I am. Moreover, my confidence in this would not change if you could prove to me that one of my female forebears cheated on her mate, such that the generic link implied by the patronymic use of Ramscar could be conclusively shown to be broken. Because the mixing implied by the difference between the 79 billion potential branches and 1.8 million actual branches suggests that I am descended from everyone living in England in Adam’s time with descendants alive today.

Mixing reveals something else about family names: I always feel a special affinity for the few Ramscars I come across that aren’t my siblings, parents, grandparents, first cousins, etc. However, in fact everyone I meet a distant relative (more or less removed), and in terms of family – which to me is best thought about in terms of overlap in the garden of forking paths that led to my birth – it follows that there are a lot of people not named Ramscar to whom I am far more closely related to than I am to those distant Ramscars.

So to go back to the question I started with – how much family information do ‘family names’ communicate? – we now have two rather disconcerting answers: First, when it comes to my nearest kin – my siblings, parents, grandparents, first cousins, etc. – Ramscar adds no useful family information at all. Not only do I know about my relationships with them by dint of growing up in my family, but of course, many of them, including my sisters, are not even named Ramscar. And second,  when it comes to distant relatives,  Ramscar is misleading, in that it suggests a false sense of kinship with distant strangers, and neglects the closer ties I inevitably share with others.

Sticks, stones and names

In the first post in this series, I described some landmark research by Marianne Bertrand and Sendhil Mullainathan, who showed that simply replacing a White-sounding name with an  African-American-sounding name on the same resume reduced the likelihood of potential employers calling back by 50%. This, along with other similar findings, is widely accepted to show how pervasive racial biases are, and in this series of posts, I’ve been trying to uncover some of the sources of those biases. In particular, I’ve been trying to work out whether Bertrand and Mullainathan’s findings simply reflect the racial prejudice of the  people who read those resumes, or whether they might also reflect responses to the names on those resumes that are not necessarily related to race at all.

And the reason I have focussed so much attention on understanding how names work, and where modern names came from, is because, as I’ll now show you, there are good reasons to believe that a great deal of the racial bias that led to those different call back rates was not in the heads of the readers of those resumes, so much as it is built into the American naming system itself.

To begin to explain why, here’s a recap of a few points from the first posts in this series:

  1. The African-American-sounding first names Bertrand and Mullainathan used in their study were less common than the the White-sounding first names.
  1. Psychologists have long known that repeated exposure to a stimulus can enhance people’s attitudes toward it, and people’s attitudes to first names follows this pattern: The next graph plots the familiarity of male and female first names on its horizontal axis, and the favorability of British-English speakers towards them on its vertical axis, and it makes clear that people are strongly biased to favor frequent first names:


Coleman, Hargreaves, & Sluckin (1981)

  1. I also showed that the rate at which the first names in Bertrand and Mullainathan’s study elicited a call back was related to their frequency in English. This graph (which plots the callback rates for female first names  in Bertrand and Mullainathan’s study on the vertical axis, and the frequency of those names on the horizontal) shows that the frequency with which these names occur in American English accounts for over 60% of the variability in the callback rates, regardless of race:new new f corr
  2. I also showed how conservative historical English naming practices were. The table below shows the 10 most frequent names given to children in the parish records of Beith, Scotland in the period 1701‒1800.  Over 90% of the boys shared 10 male names, and over 90% of the girls 10 female names:beithnames
  3. Finally, I explained how the way these names were distributed is remarkable. It took some of the finest minds of the 20th Century to figure out the ideal way to code information in communication systems, which involves varying the frequency of code words so that each one is exponentially more or less frequent than the next.bgirls
    The graph above plots the frequencies of the girls names in the table above on the vertical axis, and the names are ranked from most to least frequent on the horizontal axis.I have fitted the actual distribution of names (the points on the plot) to an exponential function (the curve on the plot). The fit is over 98% perfect.  Which means that while it is highly unlikely that these people were consciously aware of what they were doing,  the way that they distributed names among their childen almost exactly resembles what a modern communications engineer would specify to maximize the efficiency of such a system.

Happy families?

Beithers were not the only people distributing children’s first names like this. In Korea, a person’s family name is their first name, and the distribution of these names is interesting: Kim is the most common, followed by Lee and then Park. Around half of the ethnic Korean population has one of these three names.

In Korea, women keep their first names when they marry (and children inherit their father’s first name), and following an old Confucian tradition, many families record their ancestral trees in special ‘family books.’  These marital naming conventions mean that, somewhat along the lines of the following illustration (in which I used different colors to represent different names), the female names in these books provide a rich sample of Korean first (family) names, and their distribution across the population:


In the graph below, physicists Baek Seung Ki, Anh Hoang, Kiet Tuan, and Kim Beom Jun plot the distribution of the Korean first (family) names for 221,609 married women found in 10 digitized family books (started between 1510 and 1870). The rank of each first name, from most to least common, is plotted on the horizontal axis, and the (log transformed) number of people with each first name is plotted on the vertical axis.


Baek, Anh, Kiet, and Kim (2007)

What it shows is that Korean first names are exponentially distributed, and that they have been so for a long time. (First names in China, which has similar naming practices, are also exponentially distributed).

This means that female first names were exponentially distributed in Eighteenth Century Korea, and that around 50% of females had the first name KimLee or Park.  Meanwhile in Eighteenth Century Beith, around 50% of females had the first name MargaretJean or Janet, and female first names were also exponentially distributed. Which means that if we were to go back in time and listen to the way Scottish and Korean names were used in speech at this time, we could be forgiven for thinking that they both had the same name system.

Of course, in Beith there were two distributions of first names – male and female – whereas in Korea, there was just one. Clearly there was a difference. So it is worth noting that had the naming practices the Qin dynasty imposed on its citizens been introduced in Beith in 1750 – i.e., the mother keeps her first name on marrying, and the father passed his first name onto their children – its naming system would have become essentially identical to Korea’s in a generation.  (And while the idea of a father called John naming all his children John … Own Name might seem strange, it is worth considering first, that in Beith, parents frequently passed their first names to at least one child – there is a precedent – and second, is John … Own Name really any more strange than the fact that millions of American women with fathers named TedSteveBill, etc. have ended up being named Own Name… Johnson?)

And, given the properties of exponential distributions, then had this happened it is likely that both systems would still be the same today.

Everyone needs a name

Beith never did adopt the naming practices of the Qin dynasty. Rather, Western lawmakers focused their efforts on last names as they sought to make their citizens more legible to the eyes of the state. As the world’s population has grown over the past two centuries, this has created a fascinating natural experiment.

As I explained in my last post, although the generic ‘tea-cup’ suffices for talk about tea-cups, we often want to go beyond ‘this man’ or ‘that man,’ and refer to someone specifically. As languages evolved, they all converged on similar solutions to the problems sui generis names pose.  Rather than using unique words for every person (a strategy that would make remembering names impossibly hard), name grammars discriminate between individuals at a very coarse level, using a small set of first name tokens (distributed along the lines as we saw in Korea and Beith), and use sequential name tokens to increase the specificity of the information contributed by a name to the discrimination of a specific individual.

To use my own name as an example again, the 1990 US Census records 3,835,609 Michaels. Which means that Michael only discriminates me from 95% of the Male population. For many purposes this is enough: it doesn’t matter that I share my first name with around 1 in 20 English speaking males, because in many contexts, Michael is sufficient to discriminate me from anyone else that someone might be referring to. And in these contexts, speakers and listeners actually benefit from Michael being genericbecause its frequency makes it easy to recall, produce and recognize.

In contexts where Michael is not discriminative, my last name, Ramscar, can then add information to help discriminate me from other Michaels. However, the thing that makes my last name more informative than my first – its low frequency – means that Ramscar is harder to recall, produce and recognize than Michael. Thankfully, the very fact that first names precede last names in the grammar of my English name helps compensate for this. For people who know me, the more accessible Michael can serve as a to cue to Ramscar, and this helps to make the latter easier to recall and recognize in the context of my full name.

What is more, for people who don’t know me, Michael’s frequency makes it unambiguously a first name. This means that my first name disambiguates the fact that Ramscar is a surname in the same way that hearing an article like “a” helps a listener disambiguate the fact that “a rampick” is a noun. This reveals a grammatical benefit of the small sets of first names that are ubiquitous in the world’s languages: they serve to make clear that first names are indeed names, and this helps to make names less grammatically ambiguous.

In other words, the thing that makes first names lack ambiguity and be memorable – their frequency in the population – necessarily makes them  ambiguous when it comes to individual identities.  But this is ok, because surnames make up for this. Linguistically, we might say that Michael does mainly grammatical work – it makes clear that my name is a male name – and Ramscar contributes more to semantics, doing most of the work of discriminating me from everyone else.

The fact that last names provide most of the semantic information in American (and other English) names  is made clear by this next graph, which  plots the number of different names  in the 2000 US Census on the vertical axis, and the kind of name – first versus last – on the horizontal axis:


As you can see, last names covey a lot more individuating information than first names.

Earlier, we saw that the more familiar an English first name is, the more favorable people judge it to be.  For last names, familiarity works differently. The graph below plots the familiarity of last names on its vertical axis, and how favorably English speakers feel towards them on its horizontal axis.


Coleman. Sluckin & Hargreaves (1981)

Unlike first names, where favorability increases linearly with familiarity, the relationship between favorability and familiarity for last names takes an inverted-U function, such that the most and least familiar names are viewed least favorably, whereas names in the middle of the familiarity band are viewed most favorably.

It is also important to note that the most frequent first names occur far more often in English that the most frequent last names. Thus it seems that what people find favorable in a first name is that it easy to remember, produce and recognize, not so much that it is informative. Yet when it comes to last names, people want information: not too much information (Zxxjazztzz) or too little (Smith or Johnson), but rather, a happy medium.

Which, given my description of how I think name grammars actually work, is not so surprising.

What happens to names as populations grow?

Across the planet, industrialization has caused populations to grow. If we allow for the fact that Britain industrialized prior to Korea, then the following chart (which plots total population on the vertical axis, and census points on the horizontal axis) shows how the similar  natures of the population growth in Korea and Britain can enable us to make a reasonable comparison of their effects on their respective naming systems.


Sinosphere nations used the first, and least informative part of native names as a fixed hereditary patronym, whereas in the West, legislatures have mainly elected to fix the last, most informative part of names. We know from the results from Baek and colleagues I presented above that the number of Korean first names increased only slightly as the country’s population grew (although it is hard to determine how much of this is due to new first new names being added, and how much is due to the fact that the increased size of their sample makes the rarest existing names more likely to actually register).

Moreover, these names kept their exponential distribution. This is important: the information in an exponential distribution is a function of its exponent (the number that governs the degree to which each item in the distribution differs exponentially from another), not the distribution’s size. This means that the increase in the amount of information conveyed by Korean first names over this time is unlikely to have been that that great.

What happened as Britain’s population grew? This table shows the percentage of males and females who were given the most frequent male and female names across the UK from 1570-1700. It makes clear that, historically, the distribution of names in Beith was typical of the greater population, and that historically, England and Korea had similar (and stable) first name distributions.


This next graph plots the growth of the population of the UK (horizontal axis), and the percentage of people with the top 3 male or female names over time (vertical axis).


As you can see, the number of people sharing these names declined reliably as the population grew. (The r2 shows that population increases  account for 96% of the change in the distribution of the top 3 male names.)

Since 90+% of the children of Beith were given one 10 male or female names – as was historically the norm – the next chart plots the same information for the top 10 names:


Again, it is clear that the rate at which people received the most popular 10 names decreased in near perfect proportion to the rate at which the population increased.

Which means that the results of this natural experiment are:

  1. Fixing first names to mark hereditary patronymics seems to have little effect on first name distributions.
  2. Fixing last names appears to distort – as well as increase the size – of the distribution of first names in direct proportion to population growth.

The garden of forked paths

The interaction between first and last name frequency we have just seen brings us back to the question of why it is that switching a White-sounding name for an African-American-sounding name on an otherwise identical resume decreases the likelihood that potential employers will call by 50%.

In the first post in this series, I showed that the African-American-sounding first names in Bertrand and Mullainathan’s study were less common than White-sounding first names, and that the frequency differences in these names can explain some of the differences in the callback rates found in the study. I also discussed a theory proposed by Freakanomics author Steven Levitt and economist Roland Fryer Jr., who noting that African-American babies born in more affluent – and more diverse neighborhoods – tended to be given names that are less noticeably African-American than babies born in poorer, more segregated neighborhoods, suggested that more African-American first names indicate that people come from worse socioeconomic backgrounds, and that this is why African-American names on resumes results in fewer callbacks.

What Levitt and Fryer’s theory fails to explain, however, is why African-American babies born in poorer, more segregated neighborhoods get more African-American sounding names. Their answer is, essentially, “because poverty…

In that first post I showed how there are clear differences in the overall structure of the names in Bertrand and Mullainathan’s resume study (and the names in subsequent replications): As compared to the White-sounding names, the African-American-sounding names have lower frequency first names and higher frequency last names. Taken together with the results of the natural experiment I just reported, this suggests another explanation of Levitt and Fryer’s difference: The UK’s history shows that fixing last names increases the diversity of first names, while lowering average first name frequencies. If the higher frequency African-American-sounding last names employed in these studies reflect actual differences in the frequency and diversity of African-American and White surnames, this would explain why African-Americans tend to have lower frequency first names: because they have higher frequency last names. It would also explain why African-Americans born in mixed neighborhoods tend to be given more frequent – and hence more typical – first names: because they will be born in neighborhoods where the average frequencies of last names are lower, unlike children born in segregated  neighborhoods, where  the high average frequencies of last names will push first name frequencies lower.

Are African-American last names less informative than White-American last names? Because the US Census office provides information about the distribution of surnames in in the population, this question is easy to answer. The graph below plots the percentage of the White and African-American population who have each of the 500 most frequent last names (on the vertical axis), and the frequency rank of these names (horizontal axis).


42% of the people in the US have one of these 500 last names, and these names are more common in the African-American community – 47% of whom have a top 500 name – as compared to the White community, where only 25% have top 500 names.

Since some of the top 500 names – Garcia, Rodriguez, Martinez – are uncommon in both of these communities, the next graph plots the 100 most frequent African-American and White last names  by frequency rank (on the vertical axis), with the percentage of the White- and African-American population with each name on the horizontal axis. The African-American last names are plotted in red, and the White-American last names in blue:


Whereas 13% of White-Americans have top 100 White last names, a full third (33%) of African-Americans have top 100 African-American last names. Moreover, the most frequent African-American names are all far more common in the African-American community than the top 100 White names are in the White community. Which is to say that Census data makes it clear that within their respective communities, African-American last names are indeed far less informative than White-American last names.

History shows that faced with the problem of naming children with less informative (more frequent) last names, English speakers overwhelmingly respond by giving children more informative (less frequent) first names. Accordingly, given the distribution of African-Americans last names, it is hardly surprising that African-Americans tend to have less frequent first names, nor should it comes as a surprise that this effect is exacerbated in segregated communities.

Institutionalized prejudice

When they are asked to judge the favorability of names, English speakers judge higher frequency first names to be more favorable than lower frequency first names, and they judge higher and lower frequency last names to be less favorable than medium frequency last names. Which means that, when African-Americans parents respond in exactly the same way as everyone else does to the constraints that modern Western naming systems impose, their children receive names that will be judged less favorably than the names chosen by many other American parents.

It follows from this that when someone is presented with a resume on which there is a name comprising a lower frequency first name and a higher frequency name last name (the structure of typical African-American names), and feels less favorably towards that resume than they do to an identical one on which there is a name comprising a higher frequency first name and a lower frequency name last name  (the structure of typical White names), they are behaving in a way that is entirely predictable given the way that people judge the favorability of first and last names based on their frequencies. This suggests that to the extent that the prejudice this behavior reflects is driven by people’s biases towards more and less frequent names – and their sensitivity to the information structure of native name grammars – it may not reflect their feelings about race at all.

Instead, what the data and findings I have reviewed suggest is that much of the racial prejudice revealed by Bertrand and Mullainathan’s study (along with other, similar results) is not in the heads of the people who read resumes so much as it is in US name system itself.

No-one knows my name

In my first post in this series, I described the way Ashkenazim names were standardized for what were initially well intentioned reasons, and how in time this led to disaster.  The possession of regulated African-American surnames was also initially associated with emancipation rather than oppression. Yet if one takes seriously the evidence I have assembled above, it seems clear that history and that regulations governing naming in the US systematically leave African-Americans at a disadvantage.

It also seems clear that no one deliberately set out to do this. Rather, as the political scientist James Scott describes in his book, Seeing like A State, laws to fix and register names were passed simply to make governance easier. Like many of the other steps states have taken to regulate societies and their practices, it is only in retrospect that the negative consequences become apparent. Naming regulations not only disadvantage whole groups, they have also drastically undermined some of the remarkably efficient  native name grammars that have evolved over thousands of years of human social and linguistic history.

Two apparent truths of modern life are that names are uniquely hard to remember, and they get harder to remember as we get older. There are two problems with them: They ignore that fact that, in the West, changes to the distribution of first names have continually made names harder to remember throughout the last century. And so although people born at the end of second world war really do find names harder to remember as they get older, we can only blame this on their fading memories if we ignore the fact that the information in US first names has increased exponentially since the 1940s, so that remembering names has gotten harder and harder for everyone every year that older adults have been alive.

Given that the first name distributions of Korea and China have hardly changed at all in comparison to the West, this suggests that name recollection for Koreans and Chinese will not have become so ridiculously hard (except, perhaps when when people’s Sinosphere names are coerced into the Western name order, or when people in Sinosphere countries use Western names in response to fashion). Which brings me to a final irony: The very fact that first name distributions have survived in their native, compact state in China and Korea means that because immigrants to the West  from these countries transpose their inherited first (‘family’) names for use as (‘last’) family names, they will inevitably end up encountering the same problems that African-Americans currently face.

The colloquial expression for the “Chinese people” is the Bǎijiāxìng – 百家姓 (the hundred names), and it is this small set of first (family) names in China that are turned into last (family) names in the West.  Given the size of this set, it is inevitable that the frequency of  Chinese-American last names will increase rapidly as populations grow, and if Chinese-Americans then respond in the way that all English speakers have done historically, this will lead to lower frequency first names. Given what we know about favorability preferences, this will inevitably produce names that people judge less favorably than the canonical high-frequency first-name, informative-last-name form.

There is already a perception among Asian-Americans that putting a discernibly Asian name on a resume reduces an applicants chance of being admitted to college. Given the parallel with Bertrand and Mullainathan’s study of African-American names, it is likely that this too reflects some of the bias that history has built in the naming system. It also raises some questions: given that the information age has made the record-keeping rationale for regulating names redundant, and given that it is clear that fixed, regulated names have the potential to disadvantage whole sections of society, is it really okay for governments to ride roughshod over thousand of years of social and linguistic evolution? And is it really ever acceptable for states to compel their citizens to use fixed, unnatural names that can hurt them?

“The real names of our people were destroyed during slavery. The last name of my forefathers was taken from them when they were brought to America and made slaves, and then the name of the slave master was given, which we refuse, we reject that name today and refuse it. I never acknowledge it whatsoever.”

Malcom X

ResearchBlogging.orgBaek SK, Kiet HA, & Kim BJ (2007). Family name distributions: master equation approach. Physical review. E, Statistical, nonlinear, and soft matter physics, 76 (4 Pt 2) PMID: 17995066

Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination American Economic Review, 94 (4), 991-1013 DOI: 10.1257/0002828042002561

Chang, J., Donnelly, P., Wiuf, C., Hein, J., Slatkin, M., Ewens, W., & Kingman, J. (1999). Recent common ancestors of all present-day individuals Advances in Applied Probability, 31 (4), 1002-1026 DOI: 10.1239/aap/1029955256

Colman, A., Sluckin, W., & Hargreaves, D. (1981). The effect of familiarity on preferences for surnames British Journal of Psychology, 72 (3), 363-369 DOI: 10.1111/j.2044-8295.1981.tb02195.x

Colman, A., Hargreaves, D., & Sluckin, W. (1981). Preferences for Christian names as a function of their experienced familiarity British Journal of Social Psychology, 20 (1), 3-5 DOI: 10.1111/j.2044-8309.1981.tb00465.x

A. Crook (2012). Personal Names in 18th-Century Scotland: a case study of the parish of Beith (North Ayrshire) Journal of Scottish Name Studies, 6, 1-10

Guo, J., Chen, Q., & Wang, Y. (2011). Statistical distribution of Chinese names Chinese Physics B, 20 (11) DOI: 10.1088/1674-1056/20/11/118901

J.C. Scott (1998). Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press

Shannon, C., Gallager, R., & Berlekamp, E. (1967). Lower bounds to error probability for coding on discrete memoryless channels. I Information and Control, 10 (1), 65-103 DOI: 10.1016/S0019-9958(67)90052-6

Shannon, C. (1948). A Mathematical Theory of Communication Bell System Technical Journal, 27 (3), 379-423 DOI: 10.1002/j.1538-7305.1948.tb01338.x

Shannon, C. (1951). Prediction and Entropy of Printed English Bell System Technical Journal, 30 (1), 50-64 DOI: 10.1002/j.1538-7305.1951.tb01366.x