Statwing's approach to statistical testing

# Statwing Correlations

## Overview

When users select two continuous or discrete variables, Statwing runs a correlation to assess whether those two groups are statistically related. Statwing defaults to calculating Pearson’s r, the most common type of correlation; if the assumptions of that test are not met, Statwing recommends a ranked version of the same test, calculating Spearman’s rho.

Additionally, Statwing uses the Fisher Transformation to calculate confidence intervals for the correlation coefficient.

## Assumptions of Pearson’s r

Statwing recommends Pearson’s r as a valid measure of correlation if certain assumptions about the data are met:

• There are no outliers in the continuous/discrete data.[1]
• The relationship between the variables is linear (e.g., y = 2x, not y = x2).[2]
• The data are in fact continuous or discrete and not ordinal.[3]

## Ranked Correlation (Spearman’s Rho)

When assumptions are violated, the Pearson’s r may no longer be a valid measure of correlation. In that case, Statwing recommends Spearman’s rho; Statwing rank-transforms the data (replaces values with their rank ordering) then runs the typical correlation. Rank transformation is a well-established method for protecting again assumption violation (a “nonparametric” method), and the rank transformation from Pearson to Spearman is the most common (Conover and Iman, 1981).

Note that Spearman’s rho still assumes that the relationship between the variables is monotonic.