US Baby Names in 2015

Boxplot

Submitted by pmagunia on April 22, 2018 - 3:07 PM

Select any column or the entire dataset option for its boxplot.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Correlation Coefficient

Submitted by pmagunia on April 22, 2018 - 3:08 PM

Select any two columns or the entire dataset option to compute the correlation coefficient matrix.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Cumulative Frequency Histogram

Submitted by pmagunia on April 22, 2018 - 3:09 PM

Select any column to plot its cumulative frequency histogram.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Dotplot

Submitted by pmagunia on April 22, 2018 - 3:10 PM

Select any column for its dotplot.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Hollow Histogram

Submitted by pmagunia on April 22, 2018 - 3:10 PM

Select any two columns to plot them simultaneously using a histogram.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Mean

Submitted by pmagunia on April 22, 2018 - 3:11 PM

Select any column to compute the arithmetic mean.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Pie Chart

Submitted by pmagunia on April 22, 2018 - 3:11 PM

Select any column to create its pie chart.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Plot

Submitted by pmagunia on April 22, 2018 - 3:07 PM

Select any two columns to plot.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Regression

Submitted by pmagunia on April 22, 2018 - 3:12 PM

Select any two columns for a simple regression analysis. The first column selected will be the independent variable.

Stem and Leaf Plots

Submitted by pmagunia on April 22, 2018 - 3:12 PM

Select any column for its stem and leaf plot.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Summary

Submitted by pmagunia on April 22, 2018 - 2:51 PM

Select any column to compute its mean, variance, and also other summary statistics.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.

Visual Summaries

Submitted by pmagunia on April 22, 2018 - 3:13 PM

Select any column for various visual summaries.

The Drupal File ID of the selected dataset. The user may load another using the search bar on the operation's page.
Submitted by pmagunia on March 25, 2017 - 6:31 PM
Attachment Size
dataset-47139.csv 510.53 KB
Documentation

2015 US Baby Names

For each year of birth YYYY after 1879, the Social Security Administration created a dataset which has the format "name,sex,number," where name is 2 to 15 characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name. Each dataset is sorted first on sex and then on number of occurrences in descending order. When there is a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to determine a name's rank. The first record for each sex has rank 1, the second record for each sex has rank 2, and so forth.

To safeguard privacy, we restrict our list of names to those with at least 5 occurrences. The original dataset can be found at https://www.ssa.gov/oact/babynames/limits.html

History

In 1998, the Social Security Administration published Actuarial Note #139, Name Distributions in the Social Security Area, August 1997, on the distribution of given names of Social Security number holders. The note, written by actuary Michael W. Shackleford, gave birth to these datasets.

Data Source

All names are from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in our data. For others who did apply, our records may not show the place of birth, and again their names are not included in our data.

Data qualifications

People using our data on popular names are urged to explicitly acknowledge the following qualifications.

  1. Names are restricted to cases where the year of birth, sex, State of birth (50 States and District of Columbia) are on record, and where the given name is at least 2 characters long.
  2. Name data are not edited. For example, the sex associated with a name may be incorrect. Entries such as "Unknown" and "Baby" are not removed from the lists.
  3. Different spellings of similar names are not combined. For example, the names Caitlin, Caitlyn, Kaitlin, Kaitlyn, Kaitlynn, Katelyn, and Katelynn are considered separate names and each has its own rank.
  4. When two different names are tied with the same frequency for a given year of birth, we break the tie by assigning rank in alphabetical order.
  5. Some names are applied to both males and females (for example, Micah). Our rankings are done by sex, so that a name such as Micah will have a different rank for males as compared to females. When you seek the popularity of a specific name (see "Popularity of a Name"), you can specify the sex. If you do not specify the sex, we provide rankings for the more popular name-sex combination.

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

From Around the Site...

Title Authored on Content type
R Dataset / Package DAAG / cps2 March 9, 2018 - 1:06 PM Dataset
R Dataset / Package HistData / Quarrels March 9, 2018 - 1:06 PM Dataset
R Dataset / Package survival / lung March 9, 2018 - 1:06 PM Dataset
R Dataset / Package HistData / Michelson March 9, 2018 - 1:06 PM Dataset
R Dataset / Package Stat2Data / WeightLossIncentive4 March 9, 2018 - 1:06 PM Dataset