I built a baby name database as a side project. In the process, I spent more hours than I care to admit staring at SSA name data going back to 1880. At some point — somewhere around the third week of cleaning and querying — it stopped being a data engineering exercise and started feeling like reading a very long, very strange national diary. A diary written not in sentences but in names: millions of small decisions, made in private, that add up to something larger than anyone intended.
What the SSA Dataset Actually Is
The Social Security Administration has been collecting baby name data since 1880, and since the mid-1990s they have published it publicly: every name given to at least five American children in a given year, broken down by sex. That "at least five" threshold is important — it means the dataset captures almost but not quite everything, deliberately excluding the rarest choices to protect privacy. What you get is a record of essentially all American naming decisions across 140-plus years, with a soft floor that filters out the most singular outliers.
The dataset has quirks. Pre-1937 Social Security registration was not universal, so early data underrepresents certain demographics — particularly Black Americans in the South, recent immigrants, and rural populations who had limited contact with federal bureaucracy. The data is also first-name only; middle names, which carry their own cultural logic, are invisible here. And the five-birth threshold means that the dataset's account of truly rare names is necessarily incomplete.
Even with those caveats, what remains is extraordinary: a continuous, nationally representative record of naming decisions from the Gilded Age through the present. No other country has anything quite like it at this scale.
The Assimilation Era (1880–1940): Erasure as Strategy
The first thing that strikes you when you look at name data from the 1880s and 1890s is how concentrated it is. In 1880, the top 10 male names — John, William, James, George, Charles, Frank, Joseph, Thomas, Henry, Robert — accounted for a very large share of all male births. The female list was similarly concentrated: Mary, Anna, Elizabeth, Emma, Minnie, Ida, Bertha, Clara, Alice, Annie.
This was the height of the great immigration waves — millions of Italians, Eastern European Jews, Irish, Poles, Germans, Scandinavians arriving in American cities. And yet the name data shows something counterintuitive: instead of diversifying the name pool, immigration-era America narrowed it. Immigrant families, navigating an unfamiliar country where social and economic access depended on assimilation, chose maximally American names for their children. The Italian family named their son John, not Giovanni. The Jewish family chose Samuel or Max over Shmuel or Menachem. The Polish family named their daughter Mary.
This was not accidental and it was not thoughtless. As Stanley Lieberson documented in A Matter of Taste: How Names, Fashions, and Culture Change (2000, Yale University Press) — the most rigorously researched academic treatment of American naming trends — immigrant naming choices in this era followed a clear logic of cultural legibility. Names were being used as assimilation signals, as bids for belonging in a country that demanded them.
What the data cannot show directly, but what it implies, is the cost of that strategy. The names that were not chosen — the Giovannis and the Shmuliks and the Władysławs — were names that belonged to a life left behind. Each John and Mary represented a deliberate act of cultural self-editing, repeated in private across millions of households, that collectively produced the name homogeneity visible in the data.
The Mid-Century Consensus (1940–1970): Peak Sameness
By the 1940s and 1950s, American baby naming had reached something that looks, from the data, like peak homogeneity. This was the era of the great postwar baby boom, suburban expansion, and the particular social pressure toward conformity that historians associate with mid-century American culture. The name data reflects it precisely.
In 1955, the top 10 male names — James, Michael, Robert, John, David, William, Richard, Thomas, Charles, Gary — accounted for a striking proportion of all male births. The female list was similarly concentrated: Mary, Linda, Patricia, Sandra, Deborah, Barbara, Kathleen, Nancy, Carol, Sharon. These are names that feel intensely period-specific now, but at the time they felt neither trendy nor dated — they felt simply normal, because they were: an enormous share of the children you would encounter in an American school in 1960 had one of these names.
Measuring name concentration over time — the percentage of all male births going to the top 10 names in a given year, for instance — reveals the mid-century as the apex of American naming sameness. The diversity index, which measures how many distinct names appear in a population and how evenly they are distributed, was at its lowest point. America was, in naming terms, more homogeneous in 1955 than it had been in 1900 or would be in 2000.
This matters beyond naming trivia. Names are a sensitive register of social conformity pressure. The mid-century convergence on a narrow name pool reflects the same forces that produced suburban conformity in housing, the narrowing of acceptable gender expression, and the postwar pressure to be visibly and emphatically American. The baby boom generation was named in a world that valued sameness.
The Individualism Explosion (1970–2000)
Something shifts in the 1970s. The civil rights movement, second-wave feminism, the countercultural movements of the 1960s — these produced cultural permission for difference that had been suppressed in the postwar decades. The name data catches this shift clearly.
Distinctively African-American names — names that had existed in Black American communities for generations but that the SSA data had underrepresented because of registration gaps — began appearing with greater frequency and visibility. Names like DeShawn, Tyrone, Tanisha, Keisha, Lakisha entered the national dataset in ways that reflected both increased registration completeness and a cultural shift: the asserting of Black identity through naming as a conscious choice, not a concession.
Jean Twenge, Eden Abebe, and Keith Campbell's 2010 analysis in Social Psychological and Personality Science — tracking trends in American parents' naming choices from 1880 to 2007 — documented this shift quantitatively: the uniqueness of names given to American children increased substantially from the 1970s onward, reflecting a broader cultural shift toward individualism and self-expression. Parents were increasingly choosing names that distinguished their child rather than names that helped them blend in.
The diversity index — the number of distinct names in use, weighted for how evenly distributed they are — began rising in the early 1970s and has not stopped since.
The Fragmentation Era (2000–Present)
By 2024, the name landscape looks almost unrecognizable compared to 1955. The top name for boys — Liam — accounts for a small fraction of all male births, far less than James did in its peak years. No single name comes close to the dominance that the mid-century names achieved. The diversity index is at its highest point in the historical record.
This is not just a story about parents wanting unusual names. It is a story about the fragmentation of shared cultural reference. In 1955, the names parents chose came from a relatively small pool of sources: family tradition, religion, popular culture that was itself concentrated in a few broadcast channels. In 2024, parents draw from an effectively infinite reference pool: global media, niche communities, vintage revival, invented combinations, names from dozens of linguistic traditions. The result is a naming landscape where any two children, chosen randomly, are less likely to share a name than at any point in American history.
Richard Alba and Victor Nee, in Remaking the American Mainstream (2003, Harvard University Press), frame assimilation as a two-directional process: immigrants change, but so does the mainstream they are assimilating into. The naming data from the 2000s onward suggests a mainstream that has become genuinely more permeable to non-Anglo names — not just Hispanic names, which are now well-represented in national top-100 lists, but Arabic names (Muhammad, Aaliyah), South Asian names, and African names. What counts as a normal American name has quietly expanded.
Robert Putnam, writing in the Scandinavian Political Studies in 2007 about diversity and community, observed that increased diversity in American communities produces both long-term benefits and short-term social friction — a tension he called the "hunkering down" effect. Whether or not you find his political conclusions convincing, the empirical observation about diversity has an analog in naming: the expansion of who counts as "American" in the name data is real, and it has happened remarkably quickly.
The Names That Survived Everything
Buried inside all this change is a small cohort of names that have remained in the American top 100 for essentially every decade since 1880. James. William. Elizabeth. The list is shorter than you might expect. These are names that have absorbed every cultural wave — assimilation pressure, individualism explosion, fragmentation — and remained somehow legible to each successive generation of parents. They are not immune to fashion; their rankings rise and fall. But they never disappear.
What keeps them alive is interesting. They are phonologically simple and cross-linguistically stable. They carry enough historical weight to feel substantial without feeling dated. They have famous bearers across enough generations and cultural contexts that no single association dominates. They are names that belong to everyone and no one — the rarest achievement in naming.
What the Data Cannot Tell Us
I want to be careful here, because the temptation when working with a dataset this large is to make it say more than it actually does.
The SSA data shows what names were given. It cannot show why — the private reasoning, the family negotiations, the cultural pressures, the aesthetic preferences that produced each individual choice. It cannot show the names that were seriously considered and rejected. It cannot show how a name felt to the child who bore it, or whether the parents' intentions were realized. It is an aggregate of decisions, not an account of decision-making.
And yet. Reading 140 years of name data is not nothing. It is the only record we have of a particular kind of cultural decision made at the most intimate scale — the naming of a new person — aggregated across a nation and a century. The story it tells is not complete, but it is real: a story about who Americans have wanted their children to be, and who they have allowed themselves to imagine their children could become.
That story has changed enormously. The 1880 parent giving their son a maximally Anglo name was making a different kind of calculation than the 2024 parent choosing a name that appears in no other family they know. Both choices are responses to the same question, the one that every naming decision is ultimately about: what kind of person am I hoping this child will get to be, and what kind of world do I think they are entering?
The data does not answer that question. But it shows, in aggregate, how American parents have answered it — and how the answers have changed.
Data source: U.S. Social Security Administration. Analysis by NamesPop.
Found this helpful?
Share it with someone who’s picking a name.