Translate into a different language

Monday, November 06, 2017

How To Lie To Yourself And Others With Statistics | Lifehacker Australia

Misusing statistics is one of the most powerful ways to lie, summarizes Eric Ravenscraft, Lifehacker Australia

Illustration: Angelica Alzona

Normally, we teach you how to avoid misinterpreting statistics, but knowing how numbers are manipulated can help you spot when it happens. To that end, we're going to show you how to make data say whatever the hell you want to back up any wrong idea you have.

It's Evil Week at Lifehacker, which means we're looking into less-than-seemly methods for getting shit done. We like to think we're shedding light on these tactics as a way to help you do the opposite, but if you are, in fact, evil, you might find this week unironically helpful. That's up to you.

Gather Sample Data That Adds Bias to Your Findings
The first step to building statistics is determining what you want to analyse. Statisticians refer to this as the "population". Then you define a subset of that data to collect that, when analysed, should be representative of the population as a whole. The larger and more accurate the sample, the more precise your conclusions can be.
Of course, there are a few big ways to screw up this type of statistical sampling, either by accident or intentionally. If the sample data you gather is bad, you'll end up with false conclusions no matter what. There are a lot of ways you can mess up your data, but here are a few of the big ones:
  • Self-Selection Bias: This type of bias occurs when the people or data you're studying voluntarily puts itself into a group that isn't representative of your whole population. For example, when we ask our readers questions like "What's your favourite texting app?" we only get responses from people who choose to read Lifehacker. The results of an informal poll like this likely won't be representative of the population at large because all our readers are smarter, funnier and more attractive than the average person.
  • Convenience Sampling: This bias occurs when a study analyses whatever data it has available, instead of trying to find representative data. For example, a pay TV news network might poll its viewers about a political candidate. Without polling people who watch other networks (or don't watch TV at all), it's impossible to say that the results of the poll would represent reality.
  • Non-Response Bias: This happens when some people in a chosen set don't respond to a statistical survey, causing the answers to shift. For example, if a survey on sexual activity asked, "Have you ever cheated on your spouse?" some people may not want to admit to infidelity, making it look like cheating is rarer than it is.
  • Open-Access Polls: These type of polls allow anyone to submit answers and, in many cases, don't even verify that people only submit an answer once. While common, they're fundamentally biased because they don't attempt to control the input in any meaningful way. For example, online polls that just ask you to click your preferred option fall under this bias. While they can be fun and useful, they're not good at objectively proving a point.
These are just some of the many, many ways that a sample can be biased. If you want to create a misleading impression, well pick your poison. For example, open-access polls on websites can be used to "prove" that whichever candidate you like best won a debate or that Undertale is the best game of all time. The beauty of sampling biases is that someone, somewhere is taking an unscientific poll that will say anything you want. So just Google around until you find an unscientific poll you like, or heck -- create your own.

Choose the Analysis That Supports Your Ideas 
Since statistics use numbers, it's easy to assume that they're hard proof of the ideas they claim to support. In reality, the maths behind statistics is complex, and analysing it improperly can yield different or even entirely contradictory conclusions. If you wanted to twist a statistic to suit your needs, fudge the maths.

Anscombe's quartet shows four different charts that have nearly the exact same statistical summaries.

To demonstrate the flaws in analysing data, statistician Francis Anscombe created Anscombe's quartet (diagrammed above). It consists of four graphs that, when viewed on a chart, show wildly different trends. The X1 chart shows a basic scatter plot with an upwards trend. X2 shows a curved trend that was going up, but is now going downward. X3 shows a smaller trend upwards, but with one outlier on the Y axis. X4 shows data that's perfectly flat on the X axis, save for one outlier that's super high on both axes.
Here's where it gets crazy. For all four of these charts, the following statements are true:
  • The average x value is 9 for each dataset
  • The average y value is 7.50 for each dataset
  • The variance for x is 11 and the variance for y is 4.12
  • The correlation between x and y is 0.816 for each dataset
If you only saw this data in text form, you might think all four situations were identical. For example, say you had a chart like X1 that showed men's salaries at your company over the years, and one like X2 showing salaries for women over the same time at the same company. If you show only the the text, you'd see they made the same average salary! However, if you show the charts, people would see that women's salaries were trending downward for some reason.

Anscombe suggested that to avoid misleading people, you should always visualise your data before drawing conclusions and be aware of how outliers influence the analysis. It's hard to miss an outlier on a properly graphed chart, but they can have a massive yet invisible effect on text. Of course, if your goal is to mislead people, you can just skip this step.
Read more... 

Source: Lifehacker Australia


If you enjoyed this post, make sure you subscribe to my Email Updates!

0 comments: