
ONEWAY BETWEEN SUBJECTS ANOVA
Assumptions of Analysis of Variance
Traditional parametric analysis of variance makes the following assumptions:
- Random sampling, or at the very least, random assignment to groups.
- Independence of scores on the response variable -- i.e., what you get
from one subject should be in no way influenced by what you get from
any of the others.
- Sampling from normal populations within each cell of the design.
- Homogeneity of variance -- the populations within each cell of the
design should all have the same variance.
The first two of these are methodological issues and will not be discussed
further. The last two are assumptions we need to look at statistically.
The Oneway ANOVA
R supplies a variety of ways of doing analysis of variance, but among the
simpliest is to use the oneway.test( ) function for simple between subjects
designs. There are advantages and disadvantages to using this function, as will
be pointed out. First, let's get some data...
> data(InsectSprays)
> str(InsectSprays)
'data.frame': 72 obs. of 2 variables:
$ count: num 10 7 20 14 14 12 10 23 17 20 ...
$ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
The data are from an agricultural experiment in which six different insect
sprays were tested in many different fields, and the response variable (DV) was
a count of insects found in each field after spraying. Let's have a look...
> attach(InsectSprays)
> tapply(count, spray, mean)
A B C D E F
14.500000 15.333333 2.083333 4.916667 3.500000 16.666667
> tapply(count, spray, var)
A B C D E F
22.272727 18.242424 3.901515 6.265152 3.000000 38.606061
> tapply(count, spray, length)
A B C D E F
12 12 12 12 12 12
There are large differences in the means, but there are also large differences
in the variances (and worse yet, the means and variances appear to be related).
At least the design is balanced. Boxplots are a good way to present the data
graphically...
> boxplot(count ~ spray)
Wow! It hurts my eyes just to look at it! The distributions are skewed, there
are outliers, and homogeneity is out the window!
Nevertheless, we shall procede naively, just to demonstrate how the
oneway.test( ) function works. It's quite simple...
> oneway.test(count ~ spray)
One-way analysis of means (not assuming equal variances)
data: count and spray
F = 36.0654, num df = 5.000, denom df = 30.043, p-value = 8e-12
By default, the oneway.test( ) applies a Welch correction for
nonhomogeneity, so this is not the answer your intro stat students would have
gotten if they were doing the problem by hand. If you want to turn off this
behavior, set the "var.equal=" option to TRUE. Another caution: The function
demands a formula interface, so the data should be in a proper data frame and
not in vectors group by group. (But see a further comment on this below.) There
is a "data=" option if you don't care to attach the data frame. One more
thing...
> detach(InsectSprays)
> ifelse(InsectSprays$count==26, NA, InsectSprays$count)
[1] 10 7 20 14 14 12 10 23 17 20 14 13 11 17 21 11 16 14 17 17 19 21 7 13 0
[26] 1 7 2 3 1 2 1 3 0 1 4 3 5 12 6 4 3 5 5 5 5 2 4 3 5
[51] 3 5 3 6 1 1 3 2 6 4 11 9 15 22 15 16 13 10 NA NA 24 13
> ifelse(InsectSprays$count==26, NA, InsectSprays$count) -> new.data
> InsectSprays$spray -> spray
> oneway.test(new.data ~ spray)
One-way analysis of means (not assuming equal variances)
data: new.data and spray
F = 35.5121, num df = 5.000, denom df = 28.697, p-value = 1.926e-11
The oneway.test( ) function is well behaved when it comes to missing values
because the default is to omit them from the analysis. This means you should
check your data for missing values before hand, as this procedure will not tell
you about them if they exist.
You can change this behavior, of course...
> oneway.test(new.data ~ spray, na.action="na.fail")
Error in na.fail.default(list(new.data = c(10, 7, 20, 14, 14, 12, 10, :
missing values in object
In which case the procedure will fail if there are missing values.
Another thing you may have noticed--R does not require the vectors it is
working with in a formula to actually be in a data frame. They can be ordinary
vectors in the workspace, as long as one is a proper response variable vector,
and the other can be considered (coerced to) a factor. I.e., the procedure will
work as long as it has proper data-frame-like vectors to work with.
> ls()
[1] "InsectSprays" "new.data" "spray"
> rm(list=ls())
Don't forget to clean up your mess.
Testing For Homogeneity of Variance
R incorporates the Bartlett test to test the null hypothesis of equal group
variances...
> bartlett.test(count ~ spray, data=InsectSprays)
Bartlett test of homogeneity of variances
data: count by spray
Bartlett's K-squared = 25.9598, df = 5, p-value = 9.085e-05
Busted!
An Alternative to the Oneway Test
When assumptions fail, a nonparametric test is often our refuge. In this
case, the Kruskal-Wallis oneway ANOVA is often recommended...
> kruskal.test(count ~ spray, data=InsectSprays)
Kruskal-Wallis rank sum test
data: count by spray
Kruskal-Wallis chi-squared = 54.6913, df = 5, p-value = 1.511e-10
However, the Kruskal-Wallis test should not be regarded as a panacea. It is
primarily intended to cure problems with normality. It performs best when the
distributions all have the same shape, and when there is homogeneity of
variance.
Power
The syntax of the power.anova.test( ) function is...
power.anova.test(groups = NULL, n = NULL,
between.var = NULL, within.var = NULL,
sig.level = 0.05, power = NULL)
Exactly one of those options should be left at NULL, and it will be calculated
from the others. However, you should be aware that "between.var=" and
"within.var=" do not refer to variances, i.e., mean squares. The value of
"between.var=" should be set to the anticipated variance of the group means.
(MS-between would be n per group times the variance of the group means.) I'm
not clear on how the "within.var=" option should be set (and don't bother going
to the help page for help!), but I BELIEVE it is the anticipated MS-error,
which is to say the per group standard deviation squared. It's not necessary
actually to have values for these but only the anticipated ratio of them. For
example, the following example occurs on the help page...
power.anova.test(groups=4, between.var=1, within.var=3,
power=.80)
# n = 11.92613
The following example occurs online at
http://www.stat.sfu.ca/~thompson/stat403-650/powersamplesize.html
and leads me to believe I am correct...
> attach(InsectSprays)
> meancounts = tapply(count, spray, mean)
> var(meancounts)
[1] 44.48056
> summary(aov(count ~ spray))
Df Sum Sq Mean Sq F value Pr(>F)
spray 5 2668.83 533.77 34.702 < 2.2e-16 ***
Residuals 66 1015.17 15.38
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> power.anova.test(groups=6, between.var=44.48, within.var=15.38, power=.9)
Balanced one-way analysis of variance power calculation
groups = 6
n = 2.28168
between.var = 44.48
within.var = 15.38
sig.level = 0.05
power = 0.9
NOTE: n is number in each group
> detach(InsectSprays)
> rm(meancounts)
This is just one example where the R help pages really fall down on the job, but
R is becoming popular enough that an answer can usually be found online, if you
are willing to look.
The aov( ) Function
The advantage of using the oneway.test( ) function is obviously the
Welch correction for nonhomogeneity. The disadvantages are it doesn't give you
a lot of information in the output, and there is no post hoc test available.
The standard R function for all kinds of ANOVA is aov( ). It's best to
store the output of this function and then to extract the information you want
by way of various extractor functions. This is the way we will be working with
all future statistical procedures. Here is an example using the InsectSprays
data...
> aov.out = aov(count ~ spray, data=InsectSprays)
> summary(aov.out)
Df Sum Sq Mean Sq F value Pr(>F)
spray 5 2668.83 533.77 34.702 < 2.2e-16 ***
Residuals 66 1015.17 15.38
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is not much more to say about aov( ) at the moment except that R
calculates ANOVAs as special cases of linear models, so much of what will
eventually be said about the lm( ) function applies. There is no Welch
correction available here, but there is an offsetting advantage, which we will
see in the next section.
Post Hoc Tests
The Tukey Honestly Significant Difference test has been implemented in the
R base distribution as the default post hoc test for ANOVA. It's easily enough
applied once the output of aov( ) has been stored in a data object...
> TukeyHSD(aov.out)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = count ~ spray, data = InsectSprays)
$spray
diff lwr upr p adj
B-A 0.8333333 -3.866075 5.532742 0.9951810
C-A -12.4166667 -17.116075 -7.717258 0.0000000
D-A -9.5833333 -14.282742 -4.883925 0.0000014
E-A -11.0000000 -15.699409 -6.300591 0.0000000
F-A 2.1666667 -2.532742 6.866075 0.7542147
C-B -13.2500000 -17.949409 -8.550591 0.0000000
D-B -10.4166667 -15.116075 -5.717258 0.0000002
E-B -11.8333333 -16.532742 -7.133925 0.0000000
F-B 1.3333333 -3.366075 6.032742 0.9603075
D-C 2.8333333 -1.866075 7.532742 0.4920707
E-C 1.4166667 -3.282742 6.116075 0.9488669
F-C 14.5833333 9.883925 19.282742 0.0000000
E-D -1.4166667 -6.116075 3.282742 0.9488669
F-D 11.7500000 7.050591 16.449409 0.0000000
F-E 13.1666667 8.467258 17.866075 0.0000000
The procedure prints out the difference between means for each pair of groups,
the lower and upper limits of a 95% confidence interval for that difference
(change this by setting the "conf.level=" option), and the p-value for the
Tukey test. Of course, the validity of all of this depends upon how well we
have satisfied the assumptions of the ANOVA test in the first place.
Bootstrap
A bootstrap solution to this problem suggests the F* critical value at alpha
equals .05 should be set closer to 2.79 than to the theoretical value of 2.35
that comes from the F(5,66)-distribution. This is discussed in the tutorial on
Resampling Techniques. .
Treatment Contrasts
This may come as something of a surprise to those who, like me, had their
last formal statistics course in the 1970s, but ANOVA is actually a special
case of multiple linear regression, and under the hood R is doing your ANOVA
calculations as if it were doing a regression problem. R offers two extractor
functions to view the results of these calculations: summary.aov( ) and
summary.lm( ). The first of these produces the standard ANOVA table and is
essentially the same as using summary( ) on the model object. The second
is more interesting.
When your design consists of a control condition and several treatments, or
when there is a distinct baseline condition to which the other groups can be
logically compared, then treatment constrasts offer a mechanism for seeing the
treatment effects. We will need a different data set to demonstrate this, so
if you haven't yet cleaned up your workspace and search path, do so now.
We will look at a data set that records the dry-weight production of biomass
by plants as the result of three different treatments: a control, and two
experimental treatments. (Do you get the distinct feeling these data sets were
assembled by a group of agricultural scientists or plant ecologists?)
> data(PlantGrowth)
> summary(PlantGrowth)
weight group
Min. :3.590 ctrl:10
1st Qu.:4.550 trt1:10
Median :5.155 trt2:10
Mean :5.073
3rd Qu.:5.530
Max. :6.310
> with(PlantGrowth, tapply(weight, group, mean))
ctrl trt1 trt2
5.032 4.661 5.526
> with(PlantGrowth, tapply(weight, group, var))
ctrl trt1 trt2
0.3399956 0.6299211 0.1958711
> with(PlantGrowth, bartlett.test(weight ~ group))
Bartlett test of homogeneity of variances
data: weight by group
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371
The design is balanced, and although there are substantial differences in the
group variances, Bartlett's test does not reveal a significant deviation from
homogeneity of variance...
> options(show.signif.stars=F) ### turn off those annoying significance stars!
> results = with(PlantGrowth, aov(weight ~ group))
> summary.aov(results)
Df Sum Sq Mean Sq F value Pr(>F)
group 2 3.7663 1.8832 4.8461 0.01591
Residuals 27 10.4921 0.3886
>
> TukeyHSD(results)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = weight ~ group)
$group
diff lwr upr p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
There is the traditional analysis, showing an overall main effect of treatment
group, and a post hoc significant difference only between treatment1 and
treatment2.
The same (ANOVA) result is produced by...
> anova(with(PlantGrowth, lm(weight ~ group)))
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
group 2 3.7663 1.8832 4.8461 0.01591
Residuals 27 10.4921 0.3886
...just in case you wanted to see it! The function lm( ) does linear
regression analysis.
To use treatment contrasts effectively, we have to be a bit more careful with
our factors. Specifically, we have to be certain R is seeing our control or
baseline condition as the first factor level. This will be the case here,
because in the absence of instructions to do otherwise, R arranges our factor
levels in alphabetical order...
> PlantGrowth$group
[1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: ctrl trt1 trt2
Look at the last line of that output to see how R is arranging the factors. If
necessary, it's easy enough to change this...
> PlantGrowth$group = factor(PlantGrowth$group, levels=c("trt2","trt1","ctrl"))
> PlantGrowth$group
[1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: trt2 trt1 ctrl
We want the original order, so let's put that back the way it was...
> PlantGrowth$group = factor(PlantGrowth$group, levels=c("ctrl","trt1","trt2"))
> PlantGrowth$group
[1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
[16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
Levels: ctrl trt1 trt2
If you've been fooling around in R for awhile and perhaps have reset the
options for contrasts, you may want to check to make sure the contrasts option
for unordered factors is still set to treatment contrasts...
> options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"
And now, just for the record, we will redo the ANOVA...
> results = with(PlantGrowth, aov(weight ~ group))
We've already seen the ANOVA table (above).
Analysis of treatment contrasts assumes a balanced design, homogeneity of
variance, and additive effects (the effect of a treatment is to add (or
subtract--same thing) a constant amount to each subject's score, plus or minus
a bit of random error). The model formula looks like this (sans the random
error bit)...
score = intercept + ax1 + bx2 + ...
For subjects in the control group, all the x's are set to zero. For subjects in
treatment group one, x-one is set to 1 and all the other x's are set to zero.
For subjects in treatment group 2, x-two is set to one and all the other x's are
set to zero. And so on. So when the regression is calculated, well, let's just
look at the result...
> summary.lm(results)
Call:
aov(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4180 -0.0060 0.2627 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.1971 25.527 <2e-16
grouptrt1 -0.3710 0.2788 -1.331 0.1944
grouptrt2 0.4940 0.2788 1.772 0.0877
Residual standard error: 0.6234 on 27 degrees of freedom
Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096
F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591
All of the information in the ANOVA table is here. The F-statistic, degrees of
freedom, and p-value are on the last line. The residual standard error is the
square root of the Mean Sq Residuals (or error mean square), and the rest can
be calculated from those (or seen by asking for summary.aov( )). In
addition, there is an effect size measure--the multiple R-squared, which is the
SS treatment divided by SS total.
The treatment contrasts are in the Coefficients table. The estimated
coefficient for the Intercept is the mean of the baseline or control group. The
standard error is the error variance (mean square error or pooled group
variance) divided by n per group and then square rooted (if that's a word). The
t-test (usually not interesting for the Intercept) is testing for a difference
from zero.
The remaining lines in the table show the differences between a treatment
group and the control group. Thus, the estimated coefficient of -0.3710 is the
difference between the treatment group one ("trt1") mean and the control group
mean. The standard error is the standard error of the difference between these
means, calculated using mean square error and n per group. The t-test is a test
for no difference between the means. In other words, this is essentially a
Fisher's LSD test.
There are no comparisons of treatment (noncontrol) groups. You could redo
the whole analysis with the factor levels regrouped to put a different level
in the baseline condition. Or a better idea would be just to do a simple
calculation at the command line...
> (-0.371 - 0.494) / .2788 # t.obs for trt1 vs trt2
[1] -3.102582
> 2 * pt(-3.1026, df=27) # p-value with df-error degrees of freedom
[1] 0.004461307
But if you are going to end up doing all possible comparisons anyway, you are
better off using Tukey's HSD test, which controls for experimentwise error
rate. For a priori contrasts, I refer you to the discussion in Chapter 9
of Crawley (2007).
Return to the Table of Contents
|