Photo: Kaiser Fung |
Photo: blogs.hbr.org (blog) |
What FiveThirtyEight’s Nate Silver and Allison McCann did with the baby names dataset sets an example for all data analysts: They imbued it with a relevant business problem, attached complementary data, made a bold, but acceptable, assumption to patch a hole in the data, and elaborated their conclusion with a margin of error. Their article represents the best of data journalism. It surpasses most examples of big data analytics, as we know it.
Curated by the Social Security Administration (SSA), the dataset of the first names of all newborn Americans since 1880 is a star of big data. In the past few years, the baby names dataset has been mined to death (pardon the pun). Its fame can be traced to computer scientist Martin Wattenberg, who created the Baby Names Voyager, a user-friendly interface for visualizing the baby names. The purpose of the Voyager is investigating what names were popular when. Since Wattenberg, a line of analysts has pursued numerous projects, such as the most trendy names, the most poisoned names, and the most distinctive name by state.
All this slicing and dicing have produced insights that are little more than sound bites or click bait. And then, Silver and McCann entered the picture.
Read more...
Related link
His latest book is Number Sense: How to Use Big Data to Your Advantage.
Source: Harvard Business Review (blog)