Analysis

Building A Final Four Naming-Prediction Model From Five Years Of Tournament Data

Jack Lin
Jack Lin· Founder & Editor-in-Chief
·9 min read
Naming Trend AnalysisSSA & Open Data

The Final Four is next weekend. The men's bracket plays in Indianapolis on Saturday; the women's bracket plays in Tampa on Friday. Both brackets are now down to four teams each. The Final Four is, structurally, the round where naming residue reaches its annual peak frequency, and five years of tournament data is enough that I can build a rough prediction model for which 2026 player first names should produce visible SSA-file movement when the September 2027 release lands.

The Model's Three Inputs

The model I have been working with takes three inputs and produces a rough probability of post-tournament SSA-file movement. Input one: the player's first name's current SSA-file position. Names in the unsaturated zone (positions 800 through 1500) are most responsive to single-event naming-influence inputs. Names already inside the top 200 are largely saturated and produce smaller observed movement. Names already trending strongly upward see additional acceleration; names trending downward see modest reversal.

Input two: the player's role on the team. Star players get more broadcast time, which increases the cumulative repetition count for their first names. Role players get less broadcast time but more of it concentrated in role-specific moments — a bench player who hits a clutch shot in the second half gets disproportionate post-game replay coverage. The role-player residue is, in some respects, more reliable than the star-player residue because the underlying name is usually less saturated.

Input three: the team's regional fan base size and engagement intensity. Schools with large alumni networks and strong regional fan engagement produce larger county-level and state-level SSA residue than schools with smaller networks. The home-state amplification factor is consistently visible across the past five years of tournament data.

The Model's Output Is A Probability Distribution, Not A Forecast

I want to be clear about what the model produces. It produces a rough probability distribution across the player roster of plausible post-tournament SSA-file movement. It does not produce a single confident prediction of which name will move most. The cumulative uncertainty across multiple variables is too large for confident single-name forecasting.

What the model does produce is a ranking of which players' first names are structurally most likely to produce visible residue, and a rough estimate of the magnitude of the most likely outcome. That ranking is genuinely informative, even when individual predictions turn out wrong.

The 2024 Backtesting Was Encouraging

The model's backtest performance against the 2024 Final Four data is reasonable but not perfect. The top-ranked names from the 2024 model, when checked against the 2024 SSA release, showed visible SSA-file movement in roughly two-thirds of cases. The remaining one-third either produced no detectable movement or produced movement that was too small to distinguish from background noise.

Two-thirds is not great by general statistical standards, but for cultural-influence prediction it is meaningfully better than chance. The model is identifying real signal even if the signal is noisy. Future versions of the model — incorporating more years of data, refining the regional-amplification factor, accounting for player narrative depth — should improve the hit rate.

The 2025 Backtesting Was Less Encouraging

I should be honest about what the 2025 backtest looked like. The 2025 Final Four was an unusual cycle for naming-influence purposes — fewer Cinderella narratives, more concentrated star-player attention, less role-player visibility. The model's hit rate in 2025 was closer to fifty percent, which is essentially chance.

That underperformance taught me something. The model's three-input structure does not capture the year-to-year variation in tournament narrative shape. Some Final Fours are unusually star-driven; others are unusually role-player-driven; the model treats them as equivalent and gets penalized when the actual narrative shape diverges from the structural average.

The 2026 Field Sets Up Specific Predictions

Without naming specific players from this year's Final Four — predictions that age badly are no fun to read in retrospect — I will say what the model is currently flagging as the structural conditions for high-residue outcomes. Multiple teams in the Final Four have role-player rosters with first names sitting in the unsaturated zone. The home-state regional engagement intensity is high in at least two of the four men's teams and three of the four women's teams. The narrative shape is more role-player-driven than the 2025 cycle was.

If those structural conditions hold across the actual broadcasts next weekend, the model's hit rate should be closer to the 2024 result than the 2025 result. The September 2027 SSA release will give us the data.

The Counter-Argument I Owe You

Cultural-influence prediction models are notoriously hard to validate. Two-thirds hit rates can come from genuine model insight or from selection effects in how the predictions are evaluated after the fact. I have tried to be honest about the model's performance, but I cannot fully rule out that some of the apparent skill is artifact rather than signal.

What I am more confident about is the model's structural framework. The three inputs — name saturation, player role, regional engagement — are the right inputs for the kind of prediction the model is trying to make. The combination is consistent with the broader naming-influence research literature. The specific calibration of the model is where the uncertainty lives.

The Pet-Name Echo In The Model

One additional model output worth flagging. The same three inputs predict pet-name licensing-file residue with similar accuracy to the SSA file residue. That is not a coincidence; it reflects the structural similarity between baby-name and pet-name diffusion patterns. Names that move on the SSA file from Final Four exposure also tend to move on pet-licensing files, with timing that is similar but slightly faster.

The pet-name version of the model has slightly better hit rates than the baby-name version because pet adoption decisions cycle faster and the residue is more responsive to recent cultural inputs. If you are tracking the model's performance, the pet-licensing files in May and June 2026 will give you faster validation than the SSA file in September 2027.

What This Means For Parents Watching The Broadcast

If you watch the Final Four next weekend with active naming considerations on your mind, the structural insight the model offers is this: pay disproportionate attention to role-player first names, especially names that you do not immediately recognize from the regular-season broadcast. Those are the names structurally most likely to produce SSA-file residue.

The star-player names will get more broadcast time but smaller marginal naming influence. The role-player names will get less broadcast time but larger marginal naming influence per repetition. The model favors the role-player names, and the historical residue patterns confirm the model's framing.

Closing

Five years of NCAA Tournament data is enough to build a rough prediction model for Final Four naming residue. The model takes three inputs and produces a probability ranking. The 2024 backtest showed encouraging results; the 2025 backtest revealed real limitations. The 2026 Final Four next weekend will be the next live test.

I will be running the model against this year's Final Four rosters and tracking the predictions. The September 2027 SSA release will give us the validation data. The pet-licensing files will give us faster preliminary validation across the next three months. The cumulative result, across multiple validation cycles, will eventually tell us how much of the model's apparent skill is real and how much is artifact. Cultural-influence prediction is hard. The Final Four data lets us at least try, and trying — with honest reporting of the results — is worth doing on a category that the casual coverage refuses to take seriously enough to model at all.

The next iteration of the model is going to incorporate narrative-shape variables explicitly. The 2025 underperformance taught me that the three structural inputs alone are not sufficient when the tournament's narrative shape diverges sharply from the structural average. Adding a narrative-shape input — quantifying whether the cycle is unusually star-driven, role-player-driven, or balanced — should improve predictive performance in years where the underlying conditions diverge from the historical mean. Whether the 2026 cycle will be balanced enough for the existing model to work, or whether it will require the narrative-shape extension, I cannot say in advance. The next weekend of broadcasts will tell us.

For readers who care about the methodology more than the predictions: the model is a working tool, not a finished product. It will keep improving across additional cycles. The predictions it produces should be read with appropriate uncertainty, not as confident forecasts. The most honest thing I can say about the model, after working on it for two years, is that it identifies real signal but at lower precision than I would like. Cultural-influence prediction is, at the end of the day, a hard problem, and the SSA file is patient enough to keep teaching us how the patterns actually work. Each year of new data is, in this respect, a small additional teacher. Over enough cycles, the model gets meaningfully smarter. That is the slow but reliable arc of any cultural-prediction discipline, and naming influence is no exception to it.

Data source: U.S. Social Security Administration. Analysis by NamesPop.

Found this helpful?

Share it with someone who’s picking a name.

More in Analysis

Popular Names

Keep Reading

Find the perfect name for your baby

Explore 100,000+ names with meanings, origins, and popularity trends.