Statwing's approach to statistical testing

# Statwing Contingency Tables

## Overview

When users select two categorical variables, Statwing assesses whether those two variables are statistically related. Statwing runs Fisher’s exact test when possible, and otherwise runs Pearson’s chi-squared test (typically just called “chi-squared”).

## Chi-squared vs. Fisher’s Exact Test

Fisher’s exact test is unbiased whenever it can be run, but it is computationally difficult to run if the table is greater than 2×2 or the sample size is greater than 10,000 (even with modern computing).

Chi-squared tests can have biased results when sample sizes are low (technically, when expected cell counts are below 5).

Fortunately, the two tests are complementary in that Fisher’s exact test is typically easy to calculate when chi-squared tests are biased (small samples), and when Fisher’s exact test is difficult to calculate, chi-squared tends to be unbiased (large samples). Insomuch as larger tables with small samples can still create issues (and Statwing cannot run a Fisher’s exact test), Statwing alerts users to potential complications.

Like other statistical software, Statwing uses adjusted residuals to assess whether or not an individual cell is statistically significantly above or below expectations (here’s the formula). Essentially the adjusted residual asks, “Does this cell have more values in it than I’d expect if there were no relationship between these two variables?” So, in the following table, a lower than expected proportion of respondents in Finance report that they love their job: If as in the above example, you have the data displayed such that each column sums to 100% (see other ways to display it), you can say “The proportion of Finance/Banking respondents who said they ‘Love their job’ is lower than typical, relative to respondents from other industries.”

Statwing shows up to 3 arrows, depending on the p-value calculated from the adjusted residual. Statwing will show a different number of arrows depending on the degree of significance of the result. Specifically, we show one arrow if the p-value is less than alpha (i.e., 1 – confidence level), two arrows if the p-value is less than alpha/5, and three arrows if the p-value is less than alpha/50. For example, if your confidence level was set to 95%:

• p-value <= .05: one arrow
• p-value <= .01: two arrow
• p-value <= .001: three arrow

The calculation of the adjusted residual, and it’s comparison to specific alpha levels, can be labelled a “z-test” or a “z-test for a sample percentage”. Literature more typically simply says that conclusions were based on adjusted residuals:

• “An adjusted standardized residual (z score) was computed for each cell“
• “Adjusted residuals were calculated to measure deviation from average values“
• “The adjusted residual was then calculated“