Statwing's approach to statistical testing

How to use Statwing

Contents

If you’d like to read technical details about which statistical tests Statwing runs, please see Statwing’s Technical Notes.

If we fail to answer your help questions, please go back to Statwing and contact us via the tab in the lower right of the screen. We’ll help you resolve the issue, and we’ll add additional help content so everyone else will have an easier time using Statwing.

 


 
 

Tutorial Video

This video covers everything that happens in Statwing after you upload your data.

 


 


 

 
 

Uploading Data

Statwing accepts data in three ways:

  • An upload of a CSV or Excel file
  • Copy-and-paste
  • Connecting to a FluidSurveys account

 
 

Table Formatting for Excel or CSV Files or Copy-and-Paste

If you upload via Excel or CSV or copy-and-paste, data must be formatted correctly. Some tips:

  • Except for the first row, every row of data in the table/spreadsheet should be one “datapoint.” In the table below, each datapoint is one college.
  • The first row of the spreadsheet or table should be the “Header row.” This row includes the names of the data that you’d find in each column. In the table below, to the top row is the header row, and CollegeRank, Tuition, etc. are “variables.”
  • Some software outputs tables with two header rows. Statwing will accept data with two header rows, though just one header row is ideal.
This is a well-formatted table. It has one header row, and every row after is
one datapoint (in this case, each datapoint is a college)
 
This is a poorly formatted table. There is no header row, and some of the
rows within the table are groupings instead of datapoints.

 

 
 

Importing from FluidSurveys

To connect to a FluidSurveys account, click the FluidSurveys icon:
Screen Shot 2013-12-16 at 5.55.17 PM

Log into your FluidSurveys account via the window that will appear, then select “Allow Access”
Allow Access

Then choose which surveys you want to import into Statwing, and you’re all set.

 
 

Checkboxes in Statwing

If you’re analyzing a survey with checkboxes, Statwing can recognize them automatically and handle them appropriately.

checkbox-example-display

The data must be uploaded with one column for each checkbox (not one column containing the result for all checkboxes).

Here each column header says "Color:" followed by the individual colors of the checkbox; that's not necessary, but it'll make your data look prettier in Statwing.

Statwing will recognize groups of side-by-side checkboxes regardless of the names of the variables, but if they take the colon-separated format of the first example above they’ll show up more cleanly and nicely in Statwing. For example, “Color: Red” is better than “Color_Red” or “Red”.

For the most part Statwing is intelligent about interpreting your data as checkboxes. These are some but not all of the acceptable formats for a column of checkboxes:

checkbox-column-example-columns

If some questions were skipped by some survey respondents, those cells must be left blank, and cells where respondents saw the checkbox but did not check it must be given a value. Some but not all of the acceptable formats for a column of checkboxes where some respondents didn’t see the question:

checkbox-column-example-columns-with-skips

 
 

Changing variable names

If you upload data but the names of the variables are big and unwieldy, you can change those names. Go to variable settings by clicking on the name of the dataset in the upper left…
Screen Shot 2014-03-07 at 6.36.34 PM


…then click the variable name…
change variable name


…and you can change the name to whatever you please.
Screen Shot 2014-03-07 at 7.49.02 PM

 
 

Data Types

Each column of the spreadsheet/table (the “dataset”) is a “variable.” Statwing intelligently interprets the dataset and classifies each variable into a type. The most important variable types are the following:

  • nums-icon-for-wp Numbers: Points on a scale (like weight or dollars) or a count (like number of months as a customer).
  • cats-icon-for-wp Categories: A grouping (like gender or political party).
  • cats-icon-for-wp Time: Most kinds of time, including years (“2010″), dates (“1/27/1984″), timestamps (“4/7/14 19:08″), times of day (“5:34 AM”), and more. Durations, like “7″ seconds or minutes or days, are best classified as Numbers. Statwing automatically guesses whether you’re using MM/DD/YYYY format or DD/MM/YYYY format, based on dates like 31/12/2014 (or 12/31/2014) that indicate which type is being used.
  • ranks-icon-for-wp Ranks: Ordered categories (like military ranks) or rank ordering (like place in a race). If you asked survey respondents to select which annual income bucket they are in (e.g., “$0 to $25k” and “$25k to $50k”) instead of allowing them to respond freely (e.g., “$45,223″ and “$89,400″), those buckets have a clear order and are therefore Ranks. For now, ranks need to be uploaded to Statwing in numeric format. For example, “$0 to $25k” would need to show up on the spreadsheet you upload as “1″, “$25k to $50k” would be “2″, etc.; After the data is uploaded, you’ll have to go to Variable Settings and change the variable to Ranks.
Statwing is occasionally incorrect when it guesses what type of data you have. To correct Statwing’s error, select the button at the top of the analysis card that has the name of the variable, then change the variable’s settings from one type to another (most commonly, changing Numbers to Categories).

Screen Shot 2014-02-04 at 12.01.51 PM copy
You can also reach Variable Settings by clicking on the name of the dataset in the upper left.
Screen Shot 2014-03-07 at 6.36.34 PM

 
 

Running Analyses

Selecting the variable(s) you’re interested in, select an action, and Statwing will select the appropriate statistical test to run. You can currently select from two types of analysis.

Screen Shot 2013-09-05 at 3.12.30 PM

1. Describe: Statwing will summarize the data from selected variables. For example, if the variable selected was Age and every datapoint was one survey respondent’s age in years (e.g., 12, 19, 49, 34), Statwing would describe a few things about that variable: basic statistics (average, median), advanced statistics (standard deviation, confidence interval around the average), and a visualization (a histogram).

In the example above, the “4″ indicates that four variables have been selected, so each of the four will be summarized.

Screen Shot 2013-09-05 at 3.12.42 PM

2. Relate: If you select two or more variables and then select “Relate,” Statwing will show the relationship between the key variable and any other variables you’ve selected. For example, if you selected Age and Height Statwing would tell you if Age and Height were correlated, and show you a scatterplot. If you selected Gender and Height, Statwing would tell you if one gender tends to be taller than the other, and show you a histogram of Height for each Gender. If you selected all three at once and the key variable was Height, both of those analyses would be run (i.e., relating both Age and Gender to the key variable, Height).

Clicking the arrows on the right side of the Relate button will swap the axes of visualizations. In the first example above, it would swap the X-axis and the Y-axis for the relationship between Age and Height.

 
 

Learn More About Automatic Insight

If you have more than two variables selected when you click Relate, Statwing will sort the results in order of which variables are most tightly related to the key variable. So if you had 15 variables selected, and your key variable was “Revenue”, the results towards the top would be the most important drivers of revenue (or at least variables that are closely related to it).

(Technical note: this is accomplished via comparing the effect sizes of any results that are statistically significant. Note also that effect sizes for different types of analyses aren’t directly comparable, so we use appropriately adjusted effect sizes behind the scenes.)

 

 
 

Crosstabs

When you choose to relate two Categories to each other, Statwing produces a “crosstab”. Below is one such crosstab, from a survey of 10,000 software developers, showing the relationship between the size of the company at which they work and their job satisfaction:

Crosstab Explanation - Column2

Of developers at companies with 1,000 to 3,000 employees, 20.9% say they love their job.

Initially, each column in the crosstab sums to 100%. So, of developers at companies with 1,000 to 3,000 employees, 20.9% say they love their job.

Above we asked, “Of developers in companies of 1,000 to 3,000, what percent love their job?” We can also reverse the question, asking “Of developers who love their job, what percent are in companies of 1,000 to 3,000?”

To answer that question, select “Row %” from the options above the chart, and the table will change to this:

Crosstab Explanation - Row2

Of developers who love their job, 5% are at companies with 1,000 to 3,000 employees.

Of developers who love their job, 5% are at companies with 1,000 to 3,000 employees. Depending on your data, one or the other ways of looking at the data may be more interesting. If you were a developer trying to choose how large a company to work for, you’d probably be most curious about which size has the highest proportion of its developers loving their job, so you’d be more interested in the first table, the “Col %” table.

Of course, by selecting “Count” you can see the same table as just the number of developers in each group:

Crosstab Explanation - Count2

93 developers who responded to the survey were in companies of 1,000 to 3,000 and loved their job.

 

Or you can also see that number a proportion of the total number of developers surveyed:

Crosstab Explanation - All2

1.4% of the developers surveyed were in companies of 1,000 to 3,000 and loved their job.

The green and red arrows indicate cells that are statistically significantly lower or higher than expected. That is to say, if there was no relationship between size of company and levels of job satisfaction, you would not expect to see such a high number of developers at very small companies that love their job. Cells can contain up to three arrows, indicating varying degrees of statistical significance.

Another way to think of what those arrows mean:

  • If you have the table sorted by “Col %” (the default, and the top chart above), green arrows mean “The percentage in this cell is statistically significantly higher than the percentage across the other cells in this row“. So in the first chart, you could say “Respondents at companies with 1-25 employees are statistically significantly more likely to say they ‘Love their job’ than respondents at other companies (e.g., the 36.7% is statistically significantly higher than the other cells in that row).”
  • If you have the table sorted by “Row %” (the second chart above), green arrows mean “The percentage in this cell is statistically significantly higher than the percentage across the other cells in this column“. So in the second chart, you could say “Respondents who say they ‘love their job’ are statistically significantly more likely to work at companies with 1-25 employees than respondents who have some other level of job satisfaction (e.g., the 47.8% is statistically significantly higher than the other cells in that column).”

For more details, see our crosstab technical notes.

 

 
 

Workspaces

workspace screenshot
Think of workspaces like tabs in a spreadsheet. When you open up a dataset, you are by default in “Workspace 1.” Any work you do in that workspace will be saved to that workspace. If you create a new workspace, any work you do there will be saved there, such that you could can go back and forth between the two workspaces, each of which will have its own set of saved work. For example, you might name one workspace “Descriptive statistics” and have a lot of Describe output there, and name another workspace “Age analysis” and keep a lot of analyses related to age in that workspace. That way you can flip back and forth quickly between different sets of analyses, or share a specific workspace and its analysis with a colleague.

 
 

Sharing

Screen Shot 2013-09-05 at 3.20.40 PM
If you select “Share,” you can create a link to the dataset that you’re currently in. Anyone you give that link to can access all workspaces in that dataset. By default, they’ll come to the workspace you were in when you selected Share. They’ll see the analyses that you saved in that workspace, and they’ll be able to run their own analyses, too. Any analyses they run or filters they create are not saved to the workspace.

 
 

Exporting

If you’d like to make a chart in Excel or PowerPoint based on an analysis, or just want to see the statistical output in Excel, you can click the export button at the top right of any card:
Screen Shot 2014-03-06 at 11.12.25 AM

If you’d like to export the same for every card in the workspace, click the name of the workspace in the upper left, then export:
Screen Shot 2013-11-02 at 5.00.04 PM

 
 

Filters

Screen Shot 2013-09-05 at 3.22.03 PM
Filters allow you to exclude data from your analyses. For example, you could create a filter that included only datapoints where Gender equals Male and Age is greater than 20. If you then related Height to Age, that analysis would only be run on datapoints that met those criteria.

 
 

Missing Values

Missing values tell Statwing to ignore certain values from a variable. For example, in surveys the answer choice “Don’t Know” is often represented by a “99.” If you asked a question with a 1 through 7 scale and you wanted to calculate the average response, you wouldn’t want to include 99s from any “Don’t Know” answers, so you’d set 99 as a missing value for that variable.

 
 

Weighting

Statwing allows users to apply weights, so that if certain types of survey respondents are under- or over-represented, Statwing can be told that one row of data counts for more or less than 1 case. To enable weights, your data must have a precalculated column of weights in your data.

First, go to the variable settings by clicking the name of the dataset in the top left of the interface:
Screen Shot 2014-02-04 at 9.55.47 PM

Then, select which variable you would like to use as weights, using the rightmost column:
Screen Shot 2014-02-04 at 9.57.17 PM

Unless you uncheck that box, all future analyses will be weighted (though if you have a specific analysis that you’d like to not include weights, you can remove the weighting from that individual card):
Screen Shot 2014-02-04 at 9.58.04 PM

Weighting can also be used if each row of your data represents several cases. For example, if you have a table that looks like this:

Pet Color Count
Cat Grey 45
Cat Black 5
Dog Grey 92
Dog Black 9

In that case you could upload the data, indicate that “Count” is the column to weight by, and Statwing will produce the same results as if you had loaded a larger table with 45 rows of grey cats, 5 rows of black cats, etc.