So far we have looked at statistical tests for quantitative data, however, we now are going to look at a test for qualitative data. The chi-square test.
The chi square test can be used for three different purposes:
- Goodness of fit: We use the hypothesis for our qualitative variable and examine it about the normal distribution for that population
- Homogeneity: Using several populations, our hypothesized distribution of our qualitative variable can be tested.
- Independence: Two qualitative variables hypothesised relationship can be tested.
To find the chi squared value, you take the sum of the difference between the observed and the expected and square it. You then divide by the expected frequency.
If we want to determine whether a die is loaded we can use the goodness of fit component of the chi squared test.
Step 1: Hypothesis
- H0 is our null hypothesis i.e. the difference between what we observe and what we expected to observe is null and is a result of normal fluctuations.
- There is no statistical significant difference between number of times any number was rolled and the expected number of times any number on the die should have appeared.
- H1 is our alternative hypothesis and assumes that the differences between what we observed and what we expected is of statistical significance.
- There is a statistical significant difference between number of times any number was rolled and the expected number of times any number on the die should have appeared.
Step 2: Analyze evidence
- No categories are empty
- At maximum 20% of the categories contain observation frequencies less then 5
Find your H0 value – the bigger this is the more significant the difference between expected and observed hence the smaller your p value will be.
Degrees of freedom is number of categories -1, hence for our loaded die it is = 6 – 1 = 5.
Find your test statistic and p value
Step 3: Draw a conclussion
- So reject H0 if our p value is less then the significance level
- Fail to reject H0 if our p value is greater then the significance level
We can use the chi squared test for examining independence between two differenct binary variables. For example, if we wanted to determine whether a student is more likely to study physics if they study biology, we can do this by looking at whether a student studies biology or not and whether they study physics or not.
We use the same process as above, however, our degrees of freedom are number of categories of people studying biology (there are 2 since you either are or you are not) and the other is number of options for people studying physics (2 again).
You subtract 1 from both of these and multiply these numbers together.
(2 – 1) * (2 – 1) = 1 degree of freedom.