Statwing's approach to statistical testing

# T-Test (Independent Samples) Statwing represents t-test results as distribution curves. Assuming there is a large enough sample size, the difference between these samples probably represents a “real” difference between the populations from which they were sampled.

Note: The below discusses the unranked “independent samples t-test”, the most common form of t-test.

## Definition

A t-test helps you compare whether two groups have different average values (for example, whether men and women have different average heights).

## Example

Let’s say you’re curious about whether New Yorkers and Kansans spend a different amount of money per month on movies. It’s impractical to ask every New Yorker and Kansan about their movie spending, so instead you ask a sample of each—maybe 300 New Yorkers and 300 Kansans—and the averages are \$14 and \$18. The t-test asks whether that difference is probably representative of a real difference between Kansans and New Yorkers generally or whether that is most likely a meaningless statistical fluke.

Technically, it asks the following: If there were in fact no difference between Kansans and New Yorkers generally, what are the chances that randomly selected groups from those populations would be as different as these randomly selected groups are? For example, if Kansans and New Yorkers as a whole actually spent the same amount of money on average, it’s very unlikely that 300 randomly selected Kansans each spend exactly \$14 and 300 randomly selected New Yorkers each spend exactly \$18. So if you’re sampling yielded those results, you would conclude that the difference in the sample groups is most likely representative of a meaningful difference between the populations as a whole.

## Definition

A t-test asks whether a difference between two groups’ averages is unlikely to have occurred because of random chance in sample selection. A difference is more likely to be meaningful and “real” if
(1) the difference between the averages is large,
(2) the sample size is large, and
(3) responses are consistently close to the average values and not widely spread out (the standard deviation is low).

The t-test’s statistical significance and the t-test’s effect size are the two primary outputs of the t-test. Statistical significance indicates whether the difference between sample averages is likely to represent an actual difference between populations (as in the example above), and the effect size indicates whether that difference is large enough to be practically meaningful.

The “One Sample T-Test” is similar to the “Independent Samples T-Test” except it is used to compare one group’s average value to a single number (for example, do Kansans on average spend more than \$13 per month on movies?). For practical purposes you can look at the confidence interval around the average value to gain this same information.

The “paired t-test” is used when each observation in one group is paired with a related observation in the other group. For example, do Kansans spend more money on movies in January or in February, where each respondent is asked about their January and their February spending? In effect a paired t-test subtracts each respondent’s January spending from their February spending (yielding the increase in spending), then take the average of all those increases in spending and looks to see whether that average is statistically significantly greater than zero (using a one sample t-test).

The “ranked independent samples t-test” asks a similar question to the typical unranked test but it is more robust to outliers (a few bad outliers can make the results of an unranked t-test invalid).