- Tutorial Video
- Uploading Data
- Changing Variable Names
- Data Types
- Running Analyses
- Automatic Insight
- Missing Values
If you’d like to read technical details about which statistical tests Statwing runs, please see Statwing’s Technical Notes.
If we fail to answer your help questions, please go back to Statwing and contact us via the tab in the lower right of the screen. We’ll help you resolve the issue, and we’ll add additional help content so everyone else will have an easier time using Statwing.
This video covers everything that happens in Statwing after you upload your data.
Statwing accepts data in three ways:
- An upload of a CSV or Excel file
- Connecting to a FluidSurveys account
Table Formatting for Excel or CSV Files or Copy-and-Paste
If you upload via Excel or CSV or copy-and-paste, data must be formatted correctly. Some tips:
- Except for the first row, every row of data in the table/spreadsheet should be one “datapoint.” In the table below, each datapoint is one college.
- The first row of the spreadsheet or table should be the “Header row.” This row includes the names of the data that you’d find in each column. In the table below, to the top row is the header row, and College, Rank, Tuition, etc. are “variables.”
- Some software outputs tables with two header rows. Statwing will accept data with two header rows, though just one header row is ideal.
Importing from FluidSurveys
Then choose which surveys you want to import into Statwing, and you’re all set.
Checkboxes in Statwing
If you’re analyzing a survey with checkboxes, Statwing can recognize them automatically and handle them appropriately.
The data must be uploaded with one column for each checkbox (not one column containing the result for all checkboxes).
Statwing will recognize groups of side-by-side checkboxes regardless of the names of the variables, but if they take the colon-separated format of the first example above they’ll show up more cleanly and nicely in Statwing. For example, “Color: Red” is better than “Color_Red” or “Red”.
For the most part Statwing is intelligent about interpreting your data as checkboxes. These are some but not all of the acceptable formats for a column of checkboxes:
If some questions were skipped by some survey respondents, those cells must be left blank, and cells where respondents saw the checkbox but did not check it must be given a value. Some but not all of the acceptable formats for a column of checkboxes where some respondents didn’t see the question:
Changing variable names
If you upload data but the names of the variables are big and unwieldy, you can change those names. Go to variable settings by clicking on the name of the dataset in the upper left…
…then click the variable name…
…and you can change the name to whatever you please.
Each column of the spreadsheet/table (the “dataset”) is a “variable.” Statwing intelligently interprets the dataset and classifies each variable into a type. The most important variable types are the following:
- Numbers: Points on a scale (like weight or dollars) or a count (like number of months as a customer).
- Categories: A grouping (like gender or political party).
- Time: Most kinds of time, including years (“2010″), dates (“1/27/1984″), timestamps (“4/7/14 19:08″), times of day (“5:34 AM”), and more. Durations, like “7″ seconds or minutes or days, are best classified as Numbers. Statwing automatically guesses whether you’re using MM/DD/YYYY format or DD/MM/YYYY format, based on dates like 31/12/2014 (or 12/31/2014) that indicate which type is being used.
- Ranks: Ordered categories (like military ranks) or rank ordering (like place in a race). If you asked survey respondents to select which annual income bucket they are in (e.g., “$0 to $25k” and “$25k to $50k”) instead of allowing them to respond freely (e.g., “$45,223″ and “$89,400″), those buckets have a clear order and are therefore Ranks. For now, ranks need to be uploaded to Statwing in numeric format. For example, “$0 to $25k” would need to show up on the spreadsheet you upload as “1″, “$25k to $50k” would be “2″, etc.; After the data is uploaded, you’ll have to go to Variable Settings and change the variable to Ranks.
Selecting the variable(s) you’re interested in, select an action, and Statwing will select the appropriate statistical test to run. You can currently select from two types of analysis.
1. Describe: Statwing will summarize the data from selected variables. For example, if the variable selected was Age and every datapoint was one survey respondent’s age in years (e.g., 12, 19, 49, 34), Statwing would describe a few things about that variable: basic statistics (average, median), advanced statistics (standard deviation, confidence interval around the average), and a visualization (a histogram).
In the example above, the “4″ indicates that four variables have been selected, so each of the four will be summarized.
2. Relate: If you select two or more variables and then select “Relate,” Statwing will show the relationship between the key variable and any other variables you’ve selected. For example, if you selected Age and Height Statwing would tell you if Age and Height were correlated, and show you a scatterplot. If you selected Gender and Height, Statwing would tell you if one gender tends to be taller than the other, and show you a histogram of Height for each Gender. If you selected all three at once and the key variable was Height, both of those analyses would be run (i.e., relating both Age and Gender to the key variable, Height).
Clicking the arrows on the right side of the Relate button will swap the axes of visualizations. In the first example above, it would swap the X-axis and the Y-axis for the relationship between Age and Height.
Learn More About Automatic Insight
If you have more than two variables selected when you click Relate, Statwing will sort the results in order of which variables are most tightly related to the key variable. So if you had 15 variables selected, and your key variable was “Revenue”, the results towards the top would be the most important drivers of revenue (or at least variables that are closely related to it).
(Technical note: this is accomplished via comparing the effect sizes of any results that are statistically significant. Note also that effect sizes for different types of analyses aren’t directly comparable, so we use appropriately adjusted effect sizes behind the scenes.)
When you choose to relate two Categories to each other, Statwing produces a “crosstab”. Below is one such crosstab, from a survey of 10,000 software developers, showing the relationship between the size of the company at which they work and their job satisfaction:
Initially, each column in the crosstab sums to 100%. So, of developers at companies with 1,000 to 3,000 employees, 20.9% say they love their job.
Above we asked, “Of developers in companies of 1,000 to 3,000, what percent love their job?” We can also reverse the question, asking “Of developers who love their job, what percent are in companies of 1,000 to 3,000?”
To answer that question, select “Row %” from the options above the chart, and the table will change to this:
Of developers who love their job, 5% are at companies with 1,000 to 3,000 employees. Depending on your data, one or the other ways of looking at the data may be more interesting. If you were a developer trying to choose how large a company to work for, you’d probably be most curious about which size has the highest proportion of its developers loving their job, so you’d be more interested in the first table, the “Col %” table.
Of course, by selecting “Count” you can see the same table as just the number of developers in each group:
Or you can also see that number a proportion of the total number of developers surveyed:
The green and red arrows indicate cells that are statistically significantly lower or higher than expected. That is to say, if there was no relationship between size of company and levels of job satisfaction, you would not expect to see such a high number of developers at very small companies that love their job. Cells can contain up to three arrows, indicating varying degrees of statistical significance.
Another way to think of what those arrows mean:
- If you have the table sorted by “Col %” (the default, and the top chart above), green arrows mean “The percentage in this cell is statistically significantly higher than the percentage across the other cells in this row“. So in the first chart, you could say “Respondents at companies with 1-25 employees are statistically significantly more likely to say they ‘Love their job’ than respondents at other companies (e.g., the 36.7% is statistically significantly higher than the other cells in that row).”
- If you have the table sorted by “Row %” (the second chart above), green arrows mean “The percentage in this cell is statistically significantly higher than the percentage across the other cells in this column“. So in the second chart, you could say “Respondents who say they ‘love their job’ are statistically significantly more likely to work at companies with 1-25 employees than respondents who have some other level of job satisfaction (e.g., the 47.8% is statistically significantly higher than the other cells in that column).”
For more details, see our crosstab technical notes.
Think of workspaces like tabs in a spreadsheet. When you open up a dataset, you are by default in “Workspace 1.” Any work you do in that workspace will be saved to that workspace. If you create a new workspace, any work you do there will be saved there, such that you could can go back and forth between the two workspaces, each of which will have its own set of saved work. For example, you might name one workspace “Descriptive statistics” and have a lot of Describe output there, and name another workspace “Age analysis” and keep a lot of analyses related to age in that workspace. That way you can flip back and forth quickly between different sets of analyses, or share a specific workspace and its analysis with a colleague.
If you select “Share,” you can create a link to the dataset that you’re currently in. Anyone you give that link to can access all workspaces in that dataset. By default, they’ll come to the workspace you were in when you selected Share. They’ll see the analyses that you saved in that workspace, and they’ll be able to run their own analyses, too. Any analyses they run or filters they create are not saved to the workspace.
Filters allow you to exclude data from your analyses. For example, you could create a filter that included only datapoints where Gender equals Male and Age is greater than 20. If you then related Height to Age, that analysis would only be run on datapoints that met those criteria.
Missing values tell Statwing to ignore certain values from a variable. For example, in surveys the answer choice “Don’t Know” is often represented by a “99.” If you asked a question with a 1 through 7 scale and you wanted to calculate the average response, you wouldn’t want to include 99s from any “Don’t Know” answers, so you’d set 99 as a missing value for that variable.
Statwing allows users to apply weights, so that if certain types of survey respondents are under- or over-represented, Statwing can be told that one row of data counts for more or less than 1 case. To enable weights, your data must have a precalculated column of weights in your data.
Unless you uncheck that box, all future analyses will be weighted (though if you have a specific analysis that you’d like to not include weights, you can remove the weighting from that individual card):
Weighting can also be used if each row of your data represents several cases. For example, if you have a table that looks like this:
In that case you could upload the data, indicate that “Count” is the column to weight by, and Statwing will produce the same results as if you had loaded a larger table with 45 rows of grey cats, 5 rows of black cats, etc.