- Kevan Oswald

# ANOVA

ANOVA is used to see if the difference in the means for two or more populations are statistically significant or the result of chance. We want to find out if the differences we are observing between means are significantly different than what would be expected to see due to random sampling fluctuation. It is used to test the null hypothesis that the means are equal.

In its simplest form ANOVA must have a dependent variable that is metric and one or more independent variables that are categorical (nonmetric), called factors.

Example of __One-Way ANOVA__: A researcher is interested in determining whether age (independent variable) effects how frequently an individual uses Facebook (dependent variable). A survey is conducted asking participants their age and how often they use Facebook: Several times a day (1), At least once a day (2), A few times a week (3), About once a week (4), A few times a month (5), Rarely (6), and Never (7).

When running a One-Way ANOVA in SPSS there are a number of options that can be selected in order to provide the best output. For simplicity we will only look at two tables, the Descriptive Statistics and the ANOVA table itself. In the Descriptive Statistics table below we see that there does appear to be a difference between the means of the different age groups, and that as we progress to the older age groups the frequency with which they use Facebook decreases.

In the ANOVA table, as we look at the "What is your age?" line, we see that the differences between those groups are in fact significant. When examining an ANOVA table if the F is large and the significance level is low (less than .05), then we can conclude that the results were not due to chance. The significance (Sig.) (also known as the p-value) is the probability of getting the F-value or higher if the null hypothesis is true (that there are no differences, no relationship between the dependent and independent variables).

Our example shows an F-value of 6.7 and a significance .000. As a result, we can conclude that the differences between the age groups are statistically significant. The variance between the groups (frequency of use between 18 to 24, 25 to 34, etc.) is 6.7 times bigger than the variance within the groups (frequency of use between only those 18 to 24). Had the Sig. value been higher, like .350, then the likelihood that the results would have been due to random sampling fluctuation would have been 350 out of 1,000 or 35%.

In addition to the descriptive statistics and ANOVA table, another table of consideration is the Test of Homogeneity of Variance. The assumption of homogeneity is an assumption that all groups have the same variance. If the output table shows a Sig. value that is non-significant (greater than .05), then we can assume homogeneity of variance. However, if the group sizes are roughly equal, then the results of the test don't really matter. If the group sizes are notably different (ratio of largest to smallest is greater than 1.5), then it is important to note if homogeneity of variance can be assumed or not. If this is the case, then the significance level of the model may be over or under estimated, decreasing the power of the test.

Sometimes it is necessary to take into account the influence of uncontrolled independent variables such as in determining how different groups exposed to different commercials evaluate a brand, it may be necessary to control for prior knowledge. In such a case analysis of covariance (ANCOVA) is used. The clip below explains how a ANCOVA is run and how it differs from ANOVA.