R Tutorials Top Banner

ONEWAY BETWEEN SUBJECTS ANOVA


Assumptions of Analysis of Variance

Traditional parametric analysis of variance makes the following assumptions:

  • Random sampling, or at the very least, random assignment to groups.
  • Independence of scores on the response variable -- i.e., what you get from one subject should be in no way influenced by what you get from any of the others.
  • Sampling from normal populations within each cell of the design.
  • Homogeneity of variance -- the populations within each cell of the design should all have the same variance.
The first two of these are methodological issues and will not be discussed further. The last two are assumptions we need to look at statistically.


The Oneway ANOVA

R supplies a variety of ways of doing analysis of variance, but among the simpliest is to use the oneway.test( ) function for simple between subjects designs. There are advantages and disadvantages to using this function, as will be pointed out. First, let's get some data...

> data(InsectSprays)
> str(InsectSprays)
'data.frame':   72 obs. of  2 variables:
 $ count: num  10 7 20 14 14 12 10 23 17 20 ...
 $ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
The data are from an agricultural experiment in which six different insect sprays were tested in many different fields, and the response variable (DV) was a count of insects found in each field after spraying. Let's have a look...
> attach(InsectSprays)
> tapply(count, spray, mean)
        A         B         C         D         E         F 
14.500000 15.333333  2.083333  4.916667  3.500000 16.666667 
> tapply(count, spray, var)
        A         B         C         D         E         F 
22.272727 18.242424  3.901515  6.265152  3.000000 38.606061 
> tapply(count, spray, length)
 A  B  C  D  E  F 
12 12 12 12 12 12
There are large differences in the means, but there are also large differences in the variances (and worse yet, the means and variances appear to be related). At least the design is balanced. Boxplots are a good way to present the data graphically...
> boxplot(count ~ spray)
Insect boxplots
Wow! It hurts my eyes just to look at it! The distributions are skewed, there are outliers, and homogeneity is out the window!

Nevertheless, we shall procede naively, just to demonstrate how the oneway.test( ) function works. It's quite simple...

> oneway.test(count ~ spray)

        One-way analysis of means (not assuming equal variances)

data:  count and spray 
F = 36.0654, num df = 5.000, denom df = 30.043, p-value = 8e-12
By default, the oneway.test( ) applies a Welch correction for nonhomogeneity, so this is not the answer your intro stat students would have gotten if they were doing the problem by hand. If you want to turn off this behavior, set the "var.equal=" option to TRUE. Another caution: The function demands a formula interface, so the data should be in a proper data frame and not in vectors group by group. (But see a further comment on this below.) There is a "data=" option if you don't care to attach the data frame. One more thing...
> detach(InsectSprays)
> ifelse(InsectSprays$count==26, NA, InsectSprays$count)
 [1] 10  7 20 14 14 12 10 23 17 20 14 13 11 17 21 11 16 14 17 17 19 21  7 13  0
[26]  1  7  2  3  1  2  1  3  0  1  4  3  5 12  6  4  3  5  5  5  5  2  4  3  5
[51]  3  5  3  6  1  1  3  2  6  4 11  9 15 22 15 16 13 10 NA NA 24 13
> ifelse(InsectSprays$count==26, NA, InsectSprays$count) -> new.data
> InsectSprays$spray -> spray
> oneway.test(new.data ~ spray)

        One-way analysis of means (not assuming equal variances)

data:  new.data and spray 
F = 35.5121, num df = 5.000, denom df = 28.697, p-value = 1.926e-11
The oneway.test( ) function is well behaved when it comes to missing values because the default is to omit them from the analysis. This means you should check your data for missing values before hand, as this procedure will not tell you about them if they exist.

You can change this behavior, of course...

> oneway.test(new.data ~ spray, na.action="na.fail")
Error in na.fail.default(list(new.data = c(10, 7, 20, 14, 14, 12, 10,  : 
  missing values in object
In which case the procedure will fail if there are missing values.

Another thing you may have noticed--R does not require the vectors it is working with in a formula to actually be in a data frame. They can be ordinary vectors in the workspace, as long as one is a proper response variable vector, and the other can be considered (coerced to) a factor. I.e., the procedure will work as long as it has proper data-frame-like vectors to work with.

> ls()
[1] "InsectSprays" "new.data"     "spray"       
> rm(list=ls())

Don't forget to clean up your mess.


Testing For Homogeneity of Variance

R incorporates the Bartlett test to test the null hypothesis of equal group variances...

> bartlett.test(count ~ spray, data=InsectSprays)

        Bartlett test of homogeneity of variances

data:  count by spray 
Bartlett's K-squared = 25.9598, df = 5, p-value = 9.085e-05
Busted!


An Alternative to the Oneway Test

When assumptions fail, a nonparametric test is often our refuge. In this case, the Kruskal-Wallis oneway ANOVA is often recommended...

> kruskal.test(count ~ spray, data=InsectSprays)

        Kruskal-Wallis rank sum test

data:  count by spray 
Kruskal-Wallis chi-squared = 54.6913, df = 5, p-value = 1.511e-10
However, the Kruskal-Wallis test should not be regarded as a panacea. It is primarily intended to cure problems with normality. It performs best when the distributions all have the same shape, and when there is homogeneity of variance.


Power

The syntax of the power.anova.test( ) function is...

power.anova.test(groups = NULL, n = NULL,
                 between.var = NULL, within.var = NULL,
                 sig.level = 0.05, power = NULL)
Exactly one of those options should be left at NULL, and it will be calculated from the others. However, you should be aware that "between.var=" and "within.var=" do not refer to variances, i.e., mean squares. The value of "between.var=" should be set to the anticipated variance of the group means. (MS-between would be n per group times the variance of the group means.) I'm not clear on how the "within.var=" option should be set (and don't bother going to the help page for help!), but I BELIEVE it is the anticipated MS-error, which is to say the per group standard deviation squared. It's not necessary actually to have values for these but only the anticipated ratio of them. For example, the following example occurs on the help page...
power.anova.test(groups=4, between.var=1, within.var=3,
                 power=.80)
# n = 11.92613
The following example occurs online at
http://www.stat.sfu.ca/~thompson/stat403-650/powersamplesize.html
and leads me to believe I am correct...
> attach(InsectSprays)
> meancounts = tapply(count, spray, mean)
> var(meancounts)
[1] 44.48056
> summary(aov(count ~ spray))
            Df  Sum Sq Mean Sq F value    Pr(>F)    
spray        5 2668.83  533.77  34.702 < 2.2e-16 ***
Residuals   66 1015.17   15.38                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
> power.anova.test(groups=6, between.var=44.48, within.var=15.38, power=.9)

     Balanced one-way analysis of variance power calculation 

         groups = 6
              n = 2.28168
    between.var = 44.48
     within.var = 15.38
      sig.level = 0.05
          power = 0.9

 NOTE: n is number in each group 

> detach(InsectSprays)
> rm(meancounts)
This is just one example where the R help pages really fall down on the job, but R is becoming popular enough that an answer can usually be found online, if you are willing to look.


The aov( ) Function

The advantage of using the oneway.test( ) function is obviously the Welch correction for nonhomogeneity. The disadvantages are it doesn't give you a lot of information in the output, and there is no post hoc test available. The standard R function for all kinds of ANOVA is aov( ). It's best to store the output of this function and then to extract the information you want by way of various extractor functions. This is the way we will be working with all future statistical procedures. Here is an example using the InsectSprays data...

> aov.out = aov(count ~ spray, data=InsectSprays)
> summary(aov.out)
            Df  Sum Sq Mean Sq F value    Pr(>F)    
spray        5 2668.83  533.77  34.702 < 2.2e-16 ***
Residuals   66 1015.17   15.38                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is not much more to say about aov( ) at the moment except that R calculates ANOVAs as special cases of linear models, so much of what will eventually be said about the lm( ) function applies. There is no Welch correction available here, but there is an offsetting advantage, which we will see in the next section.


Post Hoc Tests

The Tukey Honestly Significant Difference test has been implemented in the R base distribution as the default post hoc test for ANOVA. It's easily enough applied once the output of aov( ) has been stored in a data object...

> TukeyHSD(aov.out)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = count ~ spray, data = InsectSprays)

$spray
           diff        lwr       upr     p adj
B-A   0.8333333  -3.866075  5.532742 0.9951810
C-A -12.4166667 -17.116075 -7.717258 0.0000000
D-A  -9.5833333 -14.282742 -4.883925 0.0000014
E-A -11.0000000 -15.699409 -6.300591 0.0000000
F-A   2.1666667  -2.532742  6.866075 0.7542147
C-B -13.2500000 -17.949409 -8.550591 0.0000000
D-B -10.4166667 -15.116075 -5.717258 0.0000002
E-B -11.8333333 -16.532742 -7.133925 0.0000000
F-B   1.3333333  -3.366075  6.032742 0.9603075
D-C   2.8333333  -1.866075  7.532742 0.4920707
E-C   1.4166667  -3.282742  6.116075 0.9488669
F-C  14.5833333   9.883925 19.282742 0.0000000
E-D  -1.4166667  -6.116075  3.282742 0.9488669
F-D  11.7500000   7.050591 16.449409 0.0000000
F-E  13.1666667   8.467258 17.866075 0.0000000
The procedure prints out the difference between means for each pair of groups, the lower and upper limits of a 95% confidence interval for that difference (change this by setting the "conf.level=" option), and the p-value for the Tukey test. Of course, the validity of all of this depends upon how well we have satisfied the assumptions of the ANOVA test in the first place.


Bootstrap

A bootstrap solution to this problem suggests the F* critical value at alpha equals .05 should be set closer to 2.79 than to the theoretical value of 2.35 that comes from the F(5,66)-distribution. This is discussed in the tutorial on Resampling Techniques.

.

Treatment Contrasts

This may come as something of a surprise to those who, like me, had their last formal statistics course in the 1970s, but ANOVA is actually a special case of multiple linear regression, and under the hood R is doing your ANOVA calculations as if it were doing a regression problem. R offers two extractor functions to view the results of these calculations: summary.aov( ) and summary.lm( ). The first of these produces the standard ANOVA table and is essentially the same as using summary( ) on the model object. The second is more interesting.

When your design consists of a control condition and several treatments, or when there is a distinct baseline condition to which the other groups can be logically compared, then treatment constrasts offer a mechanism for seeing the treatment effects. We will need a different data set to demonstrate this, so if you haven't yet cleaned up your workspace and search path, do so now.

We will look at a data set that records the dry-weight production of biomass by plants as the result of three different treatments: a control, and two experimental treatments. (Do you get the distinct feeling these data sets were assembled by a group of agricultural scientists or plant ecologists?)

> data(PlantGrowth)
> summary(PlantGrowth)
     weight       group   
 Min.   :3.590   ctrl:10  
 1st Qu.:4.550   trt1:10  
 Median :5.155   trt2:10  
 Mean   :5.073            
 3rd Qu.:5.530            
 Max.   :6.310            
> with(PlantGrowth, tapply(weight, group, mean))
 ctrl  trt1  trt2 
5.032 4.661 5.526 
> with(PlantGrowth, tapply(weight, group, var))
     ctrl      trt1      trt2 
0.3399956 0.6299211 0.1958711 
> with(PlantGrowth, bartlett.test(weight ~ group))

        Bartlett test of homogeneity of variances

data:  weight by group 
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371
The design is balanced, and although there are substantial differences in the group variances, Bartlett's test does not reveal a significant deviation from homogeneity of variance...
> options(show.signif.stars=F)         ### turn off those annoying significance stars!
> results = with(PlantGrowth, aov(weight ~ group))
> summary.aov(results)
            Df  Sum Sq Mean Sq F value  Pr(>F)
group        2  3.7663  1.8832  4.8461 0.01591
Residuals   27 10.4921  0.3886                
>
> TukeyHSD(results)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = weight ~ group)

$group
            diff        lwr       upr     p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl  0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1  0.865  0.1737839 1.5562161 0.0120064
There is the traditional analysis, showing an overall main effect of treatment group, and a post hoc significant difference only between treatment1 and treatment2.

The same (ANOVA) result is produced by...

> anova(with(PlantGrowth, lm(weight ~ group)))
Analysis of Variance Table

Response: weight
          Df  Sum Sq Mean Sq F value  Pr(>F)
group      2  3.7663  1.8832  4.8461 0.01591
Residuals 27 10.4921  0.3886
...just in case you wanted to see it! The function lm( ) does linear regression analysis.

To use treatment contrasts effectively, we have to be a bit more careful with our factors. Specifically, we have to be certain R is seeing our control or baseline condition as the first factor level. This will be the case here, because in the absence of instructions to do otherwise, R arranges our factor levels in alphabetical order...

> PlantGrowth$group
 [1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: ctrl trt1 trt2
Look at the last line of that output to see how R is arranging the factors. If necessary, it's easy enough to change this...
> PlantGrowth$group = factor(PlantGrowth$group, levels=c("trt2","trt1","ctrl"))
> PlantGrowth$group
 [1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: trt2 trt1 ctrl
We want the original order, so let's put that back the way it was...
> PlantGrowth$group = factor(PlantGrowth$group, levels=c("ctrl","trt1","trt2"))
> PlantGrowth$group
 [1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: ctrl trt1 trt2
If you've been fooling around in R for awhile and perhaps have reset the options for contrasts, you may want to check to make sure the contrasts option for unordered factors is still set to treatment contrasts...
> options("contrasts")
$contrasts
        unordered           ordered
"contr.treatment"      "contr.poly"
And now, just for the record, we will redo the ANOVA...
> results = with(PlantGrowth, aov(weight ~ group))
We've already seen the ANOVA table (above).

Analysis of treatment contrasts assumes a balanced design, homogeneity of variance, and additive effects (the effect of a treatment is to add (or subtract--same thing) a constant amount to each subject's score, plus or minus a bit of random error). The model formula looks like this (sans the random error bit)...

score = intercept + ax1 + bx2 + ...

For subjects in the control group, all the x's are set to zero. For subjects in treatment group one, x-one is set to 1 and all the other x's are set to zero. For subjects in treatment group 2, x-two is set to one and all the other x's are set to zero. And so on. So when the regression is calculated, well, let's just look at the result...

> summary.lm(results)

Call:
aov(formula = weight ~ group)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4180 -0.0060  0.2627  1.3690 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.0320     0.1971  25.527   <2e-16
grouptrt1    -0.3710     0.2788  -1.331   0.1944
grouptrt2     0.4940     0.2788   1.772   0.0877

Residual standard error: 0.6234 on 27 degrees of freedom
Multiple R-squared: 0.2641,     Adjusted R-squared: 0.2096 
F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591
All of the information in the ANOVA table is here. The F-statistic, degrees of freedom, and p-value are on the last line. The residual standard error is the square root of the Mean Sq Residuals (or error mean square), and the rest can be calculated from those (or seen by asking for summary.aov( )). In addition, there is an effect size measure--the multiple R-squared, which is the SS treatment divided by SS total.

The treatment contrasts are in the Coefficients table. The estimated coefficient for the Intercept is the mean of the baseline or control group. The standard error is the error variance (mean square error or pooled group variance) divided by n per group and then square rooted (if that's a word). The t-test (usually not interesting for the Intercept) is testing for a difference from zero.

The remaining lines in the table show the differences between a treatment group and the control group. Thus, the estimated coefficient of -0.3710 is the difference between the treatment group one ("trt1") mean and the control group mean. The standard error is the standard error of the difference between these means, calculated using mean square error and n per group. The t-test is a test for no difference between the means. In other words, this is essentially a Fisher's LSD test.

There are no comparisons of treatment (noncontrol) groups. You could redo the whole analysis with the factor levels regrouped to put a different level in the baseline condition. Or a better idea would be just to do a simple calculation at the command line...

> (-0.371 - 0.494) / .2788             # t.obs for trt1 vs trt2
[1] -3.102582
> 2 * pt(-3.1026, df=27)               # p-value with df-error degrees of freedom
[1] 0.004461307
But if you are going to end up doing all possible comparisons anyway, you are better off using Tukey's HSD test, which controls for experimentwise error rate. For a priori contrasts, I refer you to the discussion in Chapter 9 of Crawley (2007).


Return to the Table of Contents