How were naming trends in the U.S.A. impacted by cinema?

Connecting the dots between movie releases and the popularity of baby names across the years.

Popularity of Mia as baby name in the U.S.A. throughout the years.

Hover key stubs to reveal famous movies, which were released around that same period.
Significant increase
Significant decrease
Stable
(Significance with p-value of 0.1)

An introduction to our datasets:

CMU Movie Summary Corpus.

Movie ID Release Date Movie Genre Character Name

A movie is “just” a fancy video. But more than that, a movie is a combination, a balance: a movie tells — a story, often with a message; a movie shows — another world, visible only through a camera lens; a movie enriches — both this other world, and that of our own.

Our core dataset contains movies released up to 2012, featuring metadata like unique IDs used throughout our study, movie genre and release dates, alongside details on character names and their genders.

Male
Female
Movie Genre

How popular were each of the movies?

Average Rating Number of Ratings

IMDb is a highly reliable source for user ratings and reviews of most movies in the main dataset, being the largest and most popular online movie database as of December 2023 with close to 1.8 million movie entries (feature films, short movies and TV movies).

Their data allow us to assess a movie's popularity and presence in its contemporary culture, which is key in our attempt to evaluate their impact on other aspects of society.

Determining character importance.

Character Order Movie Poster Better than IMDb

TMDB is newer than IMDb and only features half its number of movies, about 900 thousand. Nevertheless we use its poster links on this website for visualization purposes, because it provides an API that is faster, more accessible and more reliable than IMDb.

Top Characters by Average Importance The lower the value, the more important the character.

In any good story, there are main characters and side characters; it follows that they will not impact the audience in the same way.

In IMDb, characters and actors are generally listed in order of creditation, which is often by order of appearance on-screen. TMDB encourages listing the cast by character importance, separating the main cast from side and background characters, allowing us to define a measure of “role importance”.

Top character names, based on character order Are there names that are often used for main characters, or rather side-kicks?

A tale of movies and babies.

Names Gender Births per Year

Every name has an origin and a meaning. Having an overview of given names over time in a population is peeking into a reflection of the changing culture and beliefs of its people, influenced by its leaders, its arts and its history.

The Social Security Administration provides a list of the 1000 most popular names in the USA each year between 1880 and today; our goal is to determine whether or not the movie culture contributes a sigificant amount to the popularity of baby names, depending on the factors identified in the datasets listed above.

You're telling me it takes two numbers to measure your own ass, but only one to measure my son's future? — Joseph Cooper, Interstellar

Measuring changes in baby name rates.

Let us plot the popularity time evolution curve for a given name, for example Mia.

An "Influence metric"?

We're interested in quantifying the change in incidence of Mia as a given name, around the release of Pulp Fiction in 1994. To do so, let us zoom in around 1994 and compare the slopes between the datapoints before and after the release.

We perform 2 separate linear regressions, one over 10 years prior to the year of interest, and the other over the year of interest and the 4 suceeding years. Our metric is defined as the difference between the slopes computed by the two regressions: Δβ = βafter - βbefore.

In essence, we are computing a weighted 2nd-order finite difference, so we are measuring the local convexity of the curve.

Because we are dealing with real-world data, convexity (or acceleration) is a better measure than slope itself for determining if changes are happening in the data — indeed, the increase or decrease in baby names is always gradual, which means just taking the tangent is not enough and we must look at the change in tangents over time around the year of interest.

In our calculations, the following years are also weighted more than the preceding ones. The goal of such a weighting is to pick up more easily on sudden changes in the general direction of the slope.

If you can't find the right answer, first you identify all the wrong ones. — Joan Clarke, The Imitation Game

Exploratory data analysis:

Armed with our amazing datasets and our powerful new metric, we set out to answer a few related questions that appeared when playing around with the data. These questions arose mostly from curiosity, but also from a priori beliefs we might hold about movies, baby names and their influence on each other.

The important note here is that our analysis is mostly exploratory — in the sense that although we are indeed trying to answer a predefined question, our main purpose is exploring the dataset, trying many different ideas and seeing whether or not we obtain significant results that could later serve as a starting point for deeper, causal analysis.

1: Effect of the movie release period of the year.

Baby birth rates are higher in the summer than in the winter, which is due to higher conception rates in the autumn and winter periods (source). Let's find out if this fact implies that the movies released in this period affect more the name given to babies.

Proportion of Significant Values with 95% Confidence Intervals by Season.

To tackle this question, we look at the proportion of character names related to a significant slope increase for their respective name. We can see that it is in fact in fall and winter that we have more significant names, which seems to contradict the premise of the study and our hypothesis. We note that the "region of influence" is unknown, i.e. maybe there exists a 6-month delay in the influence of names on baby names.

Average general influence of films per month.

When looking with respect to the months instead of the seasons, the difference between fall and winter period is even less clear. The confidence interval is also large which indicates we can't really conclude that there is a significant variation.

Proportion of Significance by Season Over Years.

We remark here that in the 15 last years of the dataset, the proportion of significant movie releases is more or less 25% each, and stays consistent over the years. This is a first indication that we cannot reject H0, the data does not seem to show a significant difference of influence by movie character names over the seasons. There are more variations in older years, but this might be because there were less movies released then and thus the variance is higher.

Overall...

... we cannot reject the null hypothesis that movies released in certain months will have a significantly larger impact on the names of babies.

2: Does the genre of a movie have an impact on the influence of a movie?

What a number of different movie genres there are, and how different they make each film!

One could think about romance movies, which are very sentimental and might impact strongly people's minds due to the emotional feelings they pass through the caracter. Or maybe comic or action movies — these are also strongly ingrained in the market and have their own set of associated feelings, although less on the compassion/feeling side but rather in the act of taking action or making people laugh, without especially giving the opportunity to the viewer to relate to a possible real life or dream love story which we might get through during our life.

In this section, we try to capture whether some movie genres have significantly more influence on baby naming than others, both in term of proportion and magnitude influence. Furthermore, we plot the time series of these values for each genre: can we see historical trends within these data?

Hover and click a genre to see the evolution of the plotted value between 1880 and 2022.

Tab 1 - Mean influence of genre on baby names.

The first panel shows the average (positive and negative) convexity of popularity evolution of baby names that are associated to movies of the genre. In other words: for each movie in the genre, we look at the character names and how the baby name slopes changed around the release of that movie; then, we take the mean value over all characters of all movies associated with the genre. On the left, genres associated with local convex curves for baby names; on the right, concave curves, i.e. genres associated with a mean deceleration of number of babies.

Looking at the graph, overall movies seem to have a postive impact on naming. We also notice that genres such as "Biographical", "Film adaptation", and other in the same category that are tied to real life events have a stronger impact. However these values seem to be very small (compared to the next values).

The much larger proportion of genres associated with a positive slope change may be attributed to the general idea and a priori that movies have a positive impact and that they set off trends, not that they end them.

Tab 2 - Absolute influence on baby names by genres.

The second panel also shows the average convexity of baby curves associated to movies of the genre; the difference is that we compute the absolute value of the convexity, which means negative and positive values don't cancel each other out and both count towards the total influence.

Here, a few genres stand out, in particular the Western, Detective and Black-and-white bubbles: while all other bubbles shrink in size, these three grow significantly despite the 10-fold change in scale (in fact they are the ones causing the general shrinkage because their values are so large). If we click on the bubbles to look at the value over time, we see that names associated with these 3 genres underwent large positive and negative slope changes, which explains why they show small mean convexity but large mean absolute convexity.

Black-and-white movies are of particular note: we see a large negative peak in 1948 and around, which marks the end of the black-and-white film era with the introduction and popularization of technologies such as Kodachrome, Agfacolor and Eastmancolor (source). The presence of this peak is very interesting because it is quite unexpected: the number of black-and-white movies and characters obviously decreased, but it is surprising that the popularity of these character names as baby names also decreased. Maybe the audience realized that the names had become old because the movies stayed in black-and-white?

Tab 3 - Percentage of affected names per genre.

Finally, we look at the proportion of names that present a significant (α = 0.1) change in slope after the release of the movie. The genres are separated with respect to the number of movies that they contain, with large genres (n > 10,000) on the left and small ones on the right.

In the separation of genres by size, we can see that apart from Family Film, genres that are associated with negative convexity are all relatively small genres. This is in line with what we stated earlier about the "positive impact only" of movies, the idea that what is observed is only due to the genres not being popular enough and thus not having enough "positive impact".

There is not a lot of size variation between the bubbles and thus not much can be immediately said about the plot. Nonetheless we see that Suspense, Psychological thriller, Romantic comedy (and all other romance genres) present the largest proportion of names with a significant slope change, maybe because these are the genres where characters have the most heart-wrenching roles that viewers remember best. By contrast, other movie genres have significantly low impact percentage (Black-and-white, World cinema, Bollywood), which is probably because the character names are very uncommon in the U.S.A.; Black-and-white is once again very surpising, our hypothesis is that there are a select few characters that were extremely popular, and fell out of favor at the end of the black-and-white era.

Overall...

...there are too many different genres to draw general conclusions about their influence on or correlation with baby names, but there are sufficiently differring and interesting trends in the data to warrant further study to determine the exact impact of genre.

3: What's the effect of movie's popularity on baby naming?

We expect that popular movies have a stronger influence on the given names, and well rated movies might have a positive influence compared to badly rated movies which might have a negative one.

We focus on the popularity in this section, but we quickly realize that it is not straightforward to get satisfying results for this study question.

Overview: Matching on gender, order, movie genre and year.

Since capturing the effect of the popularity of a film can be quite challenging looking only at the variable representing the number_of_votes, we try to isolate its true effect by discarding all potential effect of other observed variables possibly problematic. To do so, we perform exact matching on the gender (as people might relate more to a female in a given situational role in a movie genre rather than a man, or vice versa), character order (by intuition, the headliners in a film are often the ones that would impact the public the most by their large screen time), movie genre (as romance film might possible have wider impact than drama) and year of release.

Which movie is popular? Which one is not?

We create a control and a treatment group. This manoeuvre is set up in order to try to better isolate the effect of a treatment wihtout possibly confounding its effect with the one of other confounders if not balancing and matching the two groups. This is based on the number of votes given a threshold.

This threshold is by default the median. However we can try to modify the threshold of votes that determines the popularity. We always keep a threshold given by the distribution of the dataset, namely the 25th, 75th and 90th percentiles.

Even if this increases the size difference between the control and the treatment groups — thus decreases the possible number of pairs — we try to better capture the effect popularity related to a movie and its effect on the influence on baby naming.

Refining: Is matching on years relevant?

So far we have performed perfect matching on 4 movies caracteristics. But is it really that important to match on the year? Let's explore this question with another matching with the median thershold on the popularity, but no requirement for year equivalency in pairs between control and treatment group.

The result are displayed in the point-plot below and as we can clearly see:

We cannot say that the difference in proportions of significantly influenced name between the popular and unpopular movies (median threshold) is significant anymore since the confidence interval do overlap.

What about the qualitative effect obtained with matching?

The previous analysis were performed on the proportion of significantly influenced names betweeen the control group of unpopular movies and the treated group of popular films to try to assess whether there was a statistically difference between the two after performing exact matching. We are now interested at the qualitative effect that could bring out this perfect matching, namely the difference in magnitude influence (absolute value) between the control and treated group.

Computation for several popularity threshold were made (25th, median, 75th, 90th percentile) and the first matching was performed considering only the significantly influenced names and matching on same order, movie genre, gender and year of release.

The results of difference in difference analysis is summarized in the dynamic pointplot below:

From that summary of difference in difference analysis through dynamic pointplot, we can observe that after matching on various popularity threshold, we cannot say that there is a statistically significant change in magnitude influence between popular and unpopular movies.

Does the results drastically change with further matching on average rating?

Since we didn't find any statistically significant changes so far, we tried to further match to possibly have better insight in the effect of movie popularity on the average magnitude influence. To do so, we perform matching on the 4 previous features and add another one, namely the average rating the movie got.

By doing so, we try to capture the possible cofounder that might exist between rating and popularity i.e., high rating movies with low number of votes might see their rating questionned compared to a high rating given to a movie with high number of votes. This measure of quality of a movie might possibly affect the outcome of average magnitude influence of movies.

Unfortunately, once again we could not draw any statistically significant conclusions from this analysis since the confidence interval are overlapping.

Overall...

... even though we controlled for the main caracteristics of a movie data entry, we were unable to draw insightful conclusions between popular and unpopular ones when looking at the average magnitude influence, compared to the interesting observations made on the proportion of influenced names. Maybe the threshold for popularity was wrongly chosen, or maybe there is indeed no influence?

4: Do characters' names impact differently depending on the status they have in the movie?

Significant impact relative to character importance.

With this first inspection, we can see that the greater the importance of characters in the film (i.e. small order, a main character), the greater the number of characters having a significant impact on the names given to newborns.

This result is in line with what we might intuitively expect, that characters make more of an impression on the viewer than characters who appear only infrequently in the film.

Character name impact per order.

Looking at the percentage of characters with a significant impact within each character order allows us to identify which role type has the greatest number of character names influencing the names of newborns. Here the difference between the influence of the first and second role is too small to say for sure which role has the most influence (the overlap in CI is too large), but it is quite clear from the first 6 bars that character name influence on baby names decreases as the importance of the character decreases.

We note that for lower-ranked roles the confidence intervals are very large due to the sparsity of the data; for them, we do not conclude anything.

Study the sign of the influence.

Mean Negative and Positive Slope Change by Order.

We note that the average slope change per order for negative impacts has a greater magnitude than for positive impacts. However, the proportion of negatively impacting character names represents less than a quarter of all significant slope change.

Proportion of Negative and Positive Slope Change by Order.

For positive slope changes (as noted in the previous sub-question), we see that for the first 5 orders the influence tends to diminish the less important the character's role.

This could be intuitively explained by the fact that the less important a character is, the less he or she will impact the audience.

Magnitude of Slope Change per Order.

The magnitude of the slope change tends to decrease for the first 5 orders. This reflects the trend of positive slope changes, which are proportionally in the majority for each order.

Does the order of a name influence differently according to gender?

Proportion of Slope Change per Order and Gender.

By looking at magnitude slope change per order and gender we notice that except for the first role and the 4th, it seems that for most roles female character names seem to influence more than male ones.

For the main role, the very similar slope change values for male and female could be explained by the fact that the main character tends to make an impression on the public's mind, regardless of gender.

Overall...

... we can say that the influence on baby names that a character within a movie will have depends on their role in said movie. The most important factor seems to be the importance of their character role in the plot, but gender also seems to have a significant impact for secondary characters.

5: Is it possible to differentiate character influence between genders?

In this section, we will explore whether gender plays a role in influencing baby names.

Average Magnitude of Significant Slope Change for year order and each Gender.

If we look at the magnitude of influence over the years, differentiating male and female, we can see a greater overall influence of female character names than male ones.

Variation magnitude per gender.

We can see that on average over all years, female character names have a greater influence on baby names than male character names.

Moreover, if we look at the proportion of names that have had a significant impact, the proportion is also greater among women (0.1262905 vs. 0.1054252 for men).

Overall...

... roles having female names have a greater impact than male names on baby names with the same gender.


To conclude...

Which factors have a measurable correlation with (and thus could have influenced) baby name popularity?

We studied the impact of seasonality, movie genre, movie popularity, character importance and character gender, and it seems like a lead female character in a suspense movie released in winter will have the most impact on future baby names. The movie doesn't have to be famous, it just has to have been released and the name of your character will magically conquer the world!

In all seriousness, there are promising paths for further causal analysis and research in the impact that movie genre and character gender have on baby names. Furthermore, the measure of popularity of a movie could definitely be improved upon, so that a correlation (which undeniably exists, but which we could not pick out) can be established in order to then build upon it and determine what exact factors that make a movie popular will also make a character name popular. This is the start of a long journey of history and culture, and one day our children will be able to connect back to their roots.


The End