Photo: Canada Free Press |
This example, discovered by Ken Ross, a mathematics professor, who has written on sabermetrics, has become a common illustration of Simpson’s paradox.
In this case the seeming paradox is a fluke resulting from the nature of percentages. Note that most of Jeter’s at bats came from the year he hit .314 and most of Justice’s at bats came from the year he hit .253. Here there is no confounding third variable to the hit- at bat relationship. It makes perfect sense to combine results to give the more accurate result as to these ballplayers’ batting averages. Jeter batted .310 lifetime, and Justice batted .279 lifetime.
Non-statisticians should find the above result puzzling. Statisticians know that this is simply Simpson’s paradox, also and perhaps more properly called Simpson’s reversal. Far from being a paradox, Simpson’s reversal is a well understood attribute of correlation not proving causation, and familiar to every statistician...
History of Simpson’s Paradox
Like many concepts in statistics, “Simpson’s paradox” was repeatedly discovered by different statisticians and at different times. Edward Simpson (1922-2019) was a British statistician who wrote about the phenomenon in 1951. At not yet quite 20 years of age, he was recruited into the famous Bletchley Park codebreaking team which was instrumental in winning Word War II. The name in his honor was coined in 1972.
But the phenomenon was also described in 1903 by British statistician Udny Yule (1871-1951) using an imaginary example where a worthless “cure” could be seen as effective due to a sex-related difference in mortality rates. The concept of correlation itself dates only from 1888, so it didn’t take much time for the effect that would become known as Simpson’s paradox to be noticed.
Read more...
Source: Canada Free Press