Translate into a different language

Sunday, April 05, 2015

What Popular Baby Names Teach Us About Data Analytics

Photo: Kaiser Fung
"A typical big data analysis goes like this: First, a data scientist finds some obscure data accumulating in a server. Next, he or she spends days or weeks slicing and dicing the numbers, eventually stumbling upon some unusual insights. Then, a meeting is organized to present the findings to business managers, after which, the scientist feels disgruntled or even disrespected while the managers wish they could take the time back." according to Kaiser Fung, professional statistician for Vimeo and author of Junk Charts, a blog devoted to the critical examination of data and graphics in the mass media.

Photo: (blog)
When these meetings fail, the main points of contention usually include unclear purpose; analyses that are too narrowly focused; and over-confidence in the science, which turns off non-technical managers. If you’re facing this situation, you should read the FiveThirtyEight article on mining the baby names dataset. When you’re done, send the article to your analytics team.

What FiveThirtyEight’s Nate Silver and Allison McCann did with the baby names dataset sets an example for all data analysts: They imbued it with a relevant business problem, attached complementary data, made a bold, but acceptable, assumption to patch a hole in the data, and elaborated their conclusion with a margin of error. Their article represents the best of data journalism. It surpasses most examples of big data analytics, as we know it.

Curated by the Social Security Administration (SSA), the dataset of the first names of all newborn Americans since 1880 is a star of big data. In the past few years, the baby names dataset has been mined to death (pardon the pun). Its fame can be traced to computer scientist Martin Wattenberg, who created the Baby Names Voyager, a user-friendly interface for visualizing the baby names. The purpose of the Voyager is investigating what names were popular when. Since Wattenberg, a line of analysts has pursued numerous projects, such as the most trendy names, the most poisoned names, and the most distinctive name by state.

All this slicing and dicing have produced insights that are little more than sound bites or click bait. And then, Silver and McCann entered the picture.

Related link

His latest book is Number Sense: How to Use Big Data to Your Advantage

Source: Harvard Business Review (blog)

If you enjoyed this post, make sure you subscribe to my Email Updates!