| Table of Contents
| Function Reference
| Function Finder
| R Project |
SINGLE FACTOR REPEATED MEASURES ANOVA
Preliminaries
Model Formulae
If you have not yet read the Model Formulae
tutorial, now would definitely be the time to do it.
Some Explanations
The naming of experimental designs has led to a great deal of confusion on
the part of students as well as tutorial writers. The design I will
discuss in this tutorial is the single factor within subjects design, also
called the single factor repeated measures design. This is a design with one
response variable, and each "subject" or "experimental unit" is measured
multiple times on that response variable. For example, pre-post designs, in
which a group of subjects is measured both before and after some treatment, fit
into this category. This design is also referred to in some sources as a
"treatment × subjects" design (read "treatment by subjects").
You can also think of the design as having treatments nested within subjects. Or
you could also think of it as a block design in which subjects correspond to
blocks. I've even heard it called split plot, in which subjects are plots.
You see my point.
There is some difference of opinion concerning the expressions "repeated
measures design" and "within subjects design." Some people reserve the use of
"repeated measures" for cases in which subjects ("units") are measured on the
same variable over time, such as in pre-post designs, and use "within subjects"
for cases in which subjects are measured in multiple experimental conditions,
e.g., reaction time to a red light, reaction time to a green light, reaction
time to a white light, etc. I have a tendency not to obey this particular
convention and use "repeated measures" and "within subjects" interchangeably.
In any event, the analysis is the same.
In my field (psychology), we are used to referring to our experimental units
as "subjects," because they are almost always either human beings or animals. I
will retain that terminology in this tutorial. This is why, in the first study
discussed below, grocery items are referred to as subjects--because the
groceries correspond to what would ordinarily be human subjects in a typical
psychology experiment. For a change, it is people in other fields who will have
to adjust to strange terminology!
Single Factor Designs With Repeated Measures
Several (seventeen!) years ago a friend and I decided to compare prices in
local grocery stores. We made a list of ten representative grocery items and
then went to four local stores and recorded the price on each item in each
store. The most convenient and logical way to table the data--and the first way
that comes to mind--is as follows (numbers are prices in dollars).
subject storeA storeB storeC storeD
lettuce 1.17 1.78 1.29 1.29
potatoes 1.77 1.98 1.99 1.99
milk 1.49 1.69 1.79 1.59
eggs 0.65 0.99 0.69 1.09
bread 1.58 1.70 1.89 1.89
cereal 3.13 3.15 2.99 3.09
ground.beef 2.09 1.88 2.09 2.49
tomato.soup 0.62 0.65 0.65 0.69
laundry.detergent 5.89 5.99 5.99 6.99
aspirin 4.46 4.84 4.99 5.15
(IMPORTANT NOTES: For those of you who think this is a treatment-by-blocks
design, it is! When I get to the tutorial on blocking,
I will draw the parallel between the two designs. For now, just pretend
groceries are "subjects." Also, please excuse for the moment the absence of
nicities in this table, like stubs and
spanners, because shortly you will be happy I did the table in this
bare bones way!)
First, I remember when a can of tomato soup was 12¢, but that's another
story. We should notice laundry detergent costs a lot more than soup. This isn't
interesting to us. In other words, there is a great deal of variability--perhaps
99% or more of the total variability--attributable entirely to individual
differences between subjects, in which we are not the least bit interested. If
we treated this as a straightforward one-way randomized ANOVA, all of this
variability would go into the error term, and any variability due to "store" would
be totally swamped. The advantage of the repeated measures analysis is that it
allows us to parcel out variability due to subjects.
Second, we are now faced with getting these data into R. Try this.
> groceries = read.table(header=T, row.names=1, text="
+ subject storeA storeB storeC storeD ####
+ lettuce 1.17 1.78 1.29 1.29 #
+ potatoes 1.77 1.98 1.99 1.99 #
+ milk 1.49 1.69 1.79 1.59 #
+ eggs 0.65 0.99 0.69 1.09 #
+ bread 1.58 1.70 1.89 1.89 ### you can copy and paste this
+ cereal 3.13 3.15 2.99 3.09 # part from the table above
+ ground.beef 2.09 1.88 2.09 2.49 #
+ tomato.soup 0.62 0.65 0.65 0.69 #
+ laundry.detergent 5.89 5.99 5.99 6.99 #
+ aspirin 4.46 4.84 4.99 5.15 ####
+ ")
> ### apparently you are not allowed to insert comments in this fashion (so
> ### much for what comes after a # aways being ignored!), so copy and paste
> ### the table part from above
>
> str(groceries)
'data.frame': 10 obs. of 4 variables:
$ storeA: num 1.17 1.77 1.49 0.65 1.58 3.13 2.09 0.62 5.89 4.46
$ storeB: num 1.78 1.98 1.69 0.99 1.7 3.15 1.88 0.65 5.99 4.84
$ storeC: num 1.29 1.99 1.79 0.69 1.89 2.99 2.09 0.65 5.99 4.99
$ storeD: num 1.29 1.99 1.59 1.09 1.89 3.09 2.49 0.69 6.99 5.15
>
> summary(groceries)
storeA storeB storeC storeD
Min. :0.620 Min. :0.650 Min. :0.650 Min. :0.690
1st Qu.:1.250 1st Qu.:1.692 1st Qu.:1.415 1st Qu.:1.365
Median :1.675 Median :1.830 Median :1.940 Median :1.940
Mean :2.285 Mean :2.465 Mean :2.436 Mean :2.626
3rd Qu.:2.870 3rd Qu.:2.857 3rd Qu.:2.765 3rd Qu.:2.940
Max. :5.890 Max. :5.990 Max. :5.990 Max. :6.990
If that didn't work, try copying and pasting this script into your R Console.
### start copying with this line
groceries = data.frame(
c(1.17,1.77,1.49,0.65,1.58,3.13,2.09,0.62,5.89,4.46),
c(1.78,1.98,1.69,0.99,1.70,3.15,1.88,0.65,5.99,4.84),
c(1.29,1.99,1.79,0.69,1.89,2.99,2.09,0.65,5.99,4.99),
c(1.29,1.99,1.59,1.09,1.89,3.09,2.49,0.69,6.99,5.15))
rownames(groceries) = c("lettuce","potatoes","milk","eggs","bread","cereal",
"ground.beef","tomato.soup","laundry.detergent","aspirin")
colnames(groceries) = c("storeA","storeB","storeC","storeD")
### end copying with this line and paste
And if that didn't do it, well, fire up scan()
and start typing!
This is a wide format data frame. It's not going to work for us. Why not?
Because our response variable--our
ONE response variable--is the price of the items, and this is listed in four
columns. We need each variable in ONE column of the data frame in order to
use the aov() function. This sort of problem occurs
so often, you'd think some nice R coder would have written a utility to
rearrange such tables.
> stack(groceries) # Funny! :)
I have not shown the output, but this is ALMOST what we want. It would have
worked beautifully if this were a between subjects design. Here it is but a
start. We need a subject identifier. Fortunately, we put the grocery names into
the row names of the short data frame.
> gr2 = stack(groceries) # I'm tired of typing "groceries"!
> gr2$subject = rep(rownames(groceries), 4) # create the "subject" variable
> gr2$subject = factor(gr2$subject) # "I declare thee a factor."
> colnames(gr2) = c("price", "store", "subject") # rename the columns
> gr2 # take a look
price store subject
1 1.17 storeA lettuce
2 1.77 storeA potatoes
3 1.49 storeA milk
4 0.65 storeA eggs
5 1.58 storeA bread
6 3.13 storeA cereal
7 2.09 storeA ground.beef
8 0.62 storeA tomato.soup
9 5.89 storeA laundry.detergent
10 4.46 storeA aspirin
11 1.78 storeB lettuce
12 1.98 storeB potatoes
13 1.69 storeB milk
14 0.99 storeB eggs
15 1.70 storeB bread
16 3.15 storeB cereal
17 1.88 storeB ground.beef
18 0.65 storeB tomato.soup
19 5.99 storeB laundry.detergent
20 4.84 storeB aspirin
21 1.29 storeC lettuce
22 1.99 storeC potatoes
23 1.79 storeC milk
24 0.69 storeC eggs
25 1.89 storeC bread
26 2.99 storeC cereal
27 2.09 storeC ground.beef
28 0.65 storeC tomato.soup
29 5.99 storeC laundry.detergent
30 4.99 storeC aspirin
31 1.29 storeD lettuce
32 1.99 storeD potatoes
33 1.59 storeD milk
34 1.09 storeD eggs
35 1.89 storeD bread
36 3.09 storeD cereal
37 2.49 storeD ground.beef
38 0.69 storeD tomato.soup
39 6.99 storeD laundry.detergent
40 5.15 storeD aspirin
NOW we have a proper long format data frame for this analysis. This is what
we need for a repeated measures analysis using the
aov() function. We need a long format data frame WITH a subjects
identifier, which must be a factor. If you numbered your subjects, and your
subjects identifier is now a column of numbers, BE SURE to declare it to be
a factor. You don't want R seeing a numeric variable here!
At which store should we shop?
> with(gr2, tapply(price, store, sum))
storeA storeB storeC storeD
22.85 24.65 24.36 26.26
For the items in the sample, it looks like storeA had the lowest prices. But as
this is only a random(ish) sample of items we might have wanted to buy, we have
to ask, will this difference hold up in general?
Once the data frame is set up correctly (all values of the response variable
in ONE column, and another column holding the subject variable--i.e., a variable
that identifies which values of the response go with which subjects), the trick
to doing a repeated measures ANOVA is in getting the model formula right. We
need to supply an Error term, which must reflect that we have
"treatments nested within subjects." That
is to say, in principle at least, we can see the "store" effect within each and
every "subject" (grocery item). The ANOVA is properly done this way.
> aov.out = aov(price ~ store + Error(subject/store), data=gr2)
> summary(aov.out)
Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 115.193 12.799
Error: subject:store
Df Sum Sq Mean Sq F value Pr(>F)
store 3 0.58586 0.19529 4.3442 0.01273 *
Residuals 27 1.21374 0.04495
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
"Store" and "subject" are our sources of variability. The treatment we are
interested in is "store" (that's what we want to see the effect of), and this
treatment effect is visible within each subject (i.e., nested within each
subject). So the proper Error term is "subject/store", which is read as "store
within subject." Notice that once all that "subject" variability is parceled
out, we do have a significant "store" main effect. (Note:
Error(subject) will
work just as well for the error term. I'll discuss this further in a later
tutorial.)
Post Hoc Tests
The Tukey HSD test will not run on this model object, so I propose pairwise
t-tests with adjusted p-values to see which stores are significantly
different. (Note: The output will be a table of p-values for each possible
pairwise comparison.)
> ### no p-adjustment
> with(gr2, pairwise.t.test(x=price, g=store, p.adjust.method="none", paired=T))
Pairwise comparisons using paired t tests
data: price and store
storeA storeB storeC
storeB 0.033 - -
storeC 0.035 0.695 -
storeD 0.012 0.245 0.110
P value adjustment method: none
> ### Bonferroni adjustment
> with(gr2, pairwise.t.test(x=price, g=store, p.adjust.method="bonf", paired=T))
Pairwise comparisons using paired t tests
data: price and store
storeA storeB storeC
storeB 0.20 - -
storeC 0.21 1.00 -
storeD 0.07 1.00 0.66
P value adjustment method: bonferroni
The syntax for this function is discussed in the
Multiple Comparisons tutorial. The new element here is the paired=T
option, which is set because these t-tests are on paired scores.
The method of contrasts described in the
Multiple Comparisons tutorial will also work. See that tutorial for
details.
> cons = cbind(c(-1,1/3,1/3,1/3), c(0,-1/2,-1/2,1), c(0,1,-1,0)) # define the contrasts
> t(cons) %*% cons # test for orthogonality
[,1] [,2] [,3]
[1,] 1.333333 0.0 0
[2,] 0.000000 1.5 0
[3,] 0.000000 0.0 2
> contrasts(gr2$store) = cons # assign contrasts to treatment
> aov.out = aov(price ~ store + Error(subject/store), data=gr2) # do the ANOVA
> summary(aov.out, split=list(
+ store=list("A vs BCD"=1,"BC vs D"=2,"B vs C"=3)
+ )) # visualize the results
Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 115.2 12.8
Error: subject:store
Df Sum Sq Mean Sq F value Pr(>F)
store 3 0.5859 0.1953 4.344 0.01273 *
store: A vs BCD 1 0.3763 0.3763 8.371 0.00745 **
store: BC vs D 1 0.2053 0.2053 4.568 0.04179 *
store: B vs C 1 0.0042 0.0042 0.094 0.76207
Residuals 27 1.2137 0.0450
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the result for B vs C is not quite the same as the t-test on
that comparison above with no p-value adjustment. That's because in the
paired pairwise t-tests, the overall pooled variance (MSE) is not used.
Treatment by Subjects
I want to point out, in preparation for a later tutorial, that this can
also be treated as a treatment-by-subjects design. That is, it can essentially
be treated as a two-factor design without replication and without an
interaction.
> aov.tbys = aov(price ~ store + subject, data=gr2)
> summary(aov.tbys)
Df Sum Sq Mean Sq F value Pr(>F)
store 3 0.59 0.195 4.344 0.0127 *
subject 9 115.19 12.799 284.722 <2e-16 ***
Residuals 27 1.21 0.045
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The advantage of running the ANOVA this way is that the Tukey test will work,
AND it will give the right answers for repeated measures data.
> TukeyHSD(aov.tbys, which="store")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = price ~ store + subject, data = gr2)
$store
diff lwr upr p adj
storeB-storeA 0.180 -0.07947858 0.4394786 0.2524605
storeC-storeA 0.151 -0.10847858 0.4104786 0.3995675
storeD-storeA 0.341 0.08152142 0.6004786 0.0065761
storeC-storeB -0.029 -0.28847858 0.2304786 0.9898432
storeD-storeB 0.161 -0.09847858 0.4204786 0.3442902
storeD-storeC 0.190 -0.06947858 0.4494786 0.2115536
The Friedman Rank Sum Test
There is a nonparametric version of the oneway ANOVA with repeated measures.
It is executed this way.
> friedman.test(price ~ store | subject, data=gr2)
Friedman rank sum test
data: price and store and subject
Friedman chi-squared = 13.3723, df = 3, p-value = 0.003897
The formula reads "price as a function of store given subject". Here, "subject"
is being treated as a blocking variable. (No, aov()
will not take the formula in this format.)
An advantage of using this test is that it will work with the short format
data frame, provided we are careful to declare the relevant columns to be a
matrix. Here, all of the columns in the data frame contain relevant data, so
declaring the matrix is easy.
> friedman.test(as.matrix(groceries))
Friedman rank sum test
data: as.matrix(groceries)
Friedman chi-squared = 13.3723, df = 3, p-value = 0.003897
For another interesting example of the Friedman test, run
example(friedman.test). And note: In spite of what the output says,
it's not 18 players, it's 22 (n=22).
The Multivariate Approach to One-way Repeated Measures Designs
(Note: You might want to read about contrast matrices in the Multiple Comparisons tutorial before reading this section, if
you're unsure what a contrast matrix is. And while I'm at it, a bit of a rant:
Just try to find an easy, well explained example of this! Most people seem to
be a lot more interested in showing off how smart they are than they are concerned
with explaining things clearly. The advantage you have with me is, I'm not that
smart to begin with!)
A few decades or so ago, there was a lot of talk of an alternative approach to
repeated measures analysis using multivariate ANOVA. The traditional repeated
measures ANOVA makes some pretty stiff assumptions and is not at all robust
to their violation. The multivariate approach relaxes those assumptions
somewhat, but also costs degrees of freedom (so you can't have everything).
You could go to alternative packages, such as nlme or lme4, to implement this
approach, but it's actually fairly easy to do using the base packages (for the
oneway design). AND, extra added bonus, it allows you to keep your data in the
more sensible short format. In fact, it insists upon it!
We need a response matrix, which we can get from the short format data frame.
> groceries
storeA storeB storeC storeD
lettuce 1.17 1.78 1.29 1.29
potatoes 1.77 1.98 1.99 1.99
milk 1.49 1.69 1.79 1.59
eggs 0.65 0.99 0.69 1.09
bread 1.58 1.70 1.89 1.89
cereal 3.13 3.15 2.99 3.09
ground.beef 2.09 1.88 2.09 2.49
tomato.soup 0.62 0.65 0.65 0.69
laundry.detergent 5.89 5.99 5.99 6.99
aspirin 4.46 4.84 4.99 5.15
> gromat = as.matrix(groceries[,1:4])
> gromat
storeA storeB storeC storeD
lettuce 1.17 1.78 1.29 1.29
potatoes 1.77 1.98 1.99 1.99
milk 1.49 1.69 1.79 1.59
eggs 0.65 0.99 0.69 1.09
bread 1.58 1.70 1.89 1.89
cereal 3.13 3.15 2.99 3.09
ground.beef 2.09 1.88 2.09 2.49
tomato.soup 0.62 0.65 0.65 0.69
laundry.detergent 5.89 5.99 5.99 6.99
aspirin 4.46 4.84 4.99 5.15
Notice that it looks exactly the same in this case, but that won't always be
true. We extracted all rows (the blank index) and columns 1 through 4 (which
just happens in this case to be all columns) of the groceries data frame and
declared the result as a matrix.
Allow me to begin by demonstating this method by doing a
Related Measures t-Test. We'll compare storeA to storeD. As the related
measures t-test is essentially a Single Sample
t-Test on difference scores, I'll begin by getting difference scores and
putting them into a single-column matrix called D. Then I'll run the command
that does the test and compare the result to the t-test.
> D = gromat[,4] - gromat[,1] # the difference scores
> D = matrix(D) # into a matrix
> D
[,1]
[1,] 0.12
[2,] 0.22
[3,] 0.10
[4,] 0.44
[5,] 0.31
[6,] -0.04
[7,] 0.40
[8,] 0.07
[9,] 1.10
[10,] 0.69
> result = lm(D ~ 1) # regress D against the intercept (D tilde one)
> summary(result)$coef # summary(result) is sufficient
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.341 0.1081301 3.153609 0.01166957
>
> t.test(D, mu=0) # for comparison
One Sample t-test
data: D
t = 3.1536, df = 9, p-value = 0.01167
... some output omitted ...
Now I point out that there are six such pairwise comparisons that are possible,
and we picked the most favorable one by looking at the data first. A fairer
result might be had if we applied a Bonferroni correction to the p-value.
> 0.01167 * 6
[1] 0.07002
It remains to generalize this method to larger D matrices, i.e., D matrices
with more columns. For comparison, I'll recall the result of the one-way
ANOVA we did above: F(3,27) = 4.344, p = .0127. I'll also point out that we
have a moderate violation of the sphericity assumption in these data (the
effects are not quite additive), and so I calculated a Greenhouse-Geisser
correction to the p-value, resulting in p = .0309, still low enough to reject
the null at an alpha of .05, but getting close.
What we need are k-1 columns of difference scores in our D matrix, where k
is the number of levels of the repeated measures factor, or 4-1=3 in this case.
R provides us with a convenient way of getting those using a function called
contr.sum(). This function creates a contrast matrix
of "sum to zero" contrasts, or "effects" contrasts. We have k=4 levels of the
repeated measures factor, so we need this matrix:
> contr.sum(4)
[,1] [,2] [,3]
1 1 0 0
2 0 1 0
3 0 0 1
4 -1 -1 -1
Using that contrast matrix, we will calculate the following differences:
A-D, B-D, and C-D. It's done in one step via matrix multiplication.
> D = gromat %*% contr.sum(4) # percent asterisk percent is matrix multiplication
> D
[,1] [,2] [,3]
lettuce -0.12 0.49 0.00
potatoes -0.22 -0.01 0.00
milk -0.10 0.10 0.20
eggs -0.44 -0.10 -0.40
bread -0.31 -0.19 0.00
cereal 0.04 0.06 -0.10
ground.beef -0.40 -0.61 -0.40
tomato.soup -0.07 -0.04 -0.04
laundry.detergent -1.10 -1.00 -1.00
aspirin -0.69 -0.31 -0.16
It remains to run the multivariate test.
> result = lm(D ~ 1)
> anova(result)
Analysis of Variance Table
Df Pillai approx F num Df den Df Pr(>F)
(Intercept) 1 0.64746 4.2853 3 7 0.05156 .
Residuals 9
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our test statistic is called Pillai's trace. An approximate F has also been
calculated. Notice it is similar to the one from the repeated measures ANOVA but
is on 3 and 7 degrees of freedom. Hence, p > .05, and the null--all treatment
means equal, or more correctly, all levels of the treatment sampled from
populations with equal means--is not
rejected at an alpha level of .05. The most commonly used test statistic
here is probably Wilks' lambda, and you can have that, too, by setting an
option.
> anova(result, test="Wilks")
Analysis of Variance Table
Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.35254 4.2853 3 7 0.05156 .
Residuals 9
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Other statistics available are Hotelling-Lawley trace (test="Hotelling") and
Roy's largest root (test="Roy"). People argue over which of these is best in
a multivariate ANOVA, but notice here it makes no difference. That p-value
just ain't gonna budge, no matter which one you use.
"Why would I want to do that?" you ask. After all, we lost our significant
effect. Notice that the Greenhouse-Geisser corrected p-value and the MANOVA p
are reasonably close but just happen to be on opposite sides of .05. It
sometimes happens in the MANOVA approach that the degrees of freedom are reduced
sufficiently to cost us a significant effect due to reduction in the power of
the test. That's especially true when n (the number of subjects) is not much
greater than k (the number of levels of the treatment), as is the case here.
These issues were discussed by Vasey and Thayer (1987).
As a second example of the multivariate approach, I'd like to duplicate the
analysis in their paper (just about). In this example, we have sufficient
subjects to make the two approaches about equal in power. The data appear in
their paper in Table 1, p. 484. The data are "measures of mean electromyographic
(EMG) amplitude [in microvolts]
from the left brow region, collected over four 90-s trials from 22 subjects.
These data were drawn from a study of affective facial expressions conducted
by" Thayer (p. 484). Get ready to type. Or try a copy and paste.
### start copying here
EMG = read.table(header=T, text="
LB1 LB2 LB3 LB4
143 368 345 772
142 155 161 178
109 167 356 956
123 135 137 187
276 216 232 307
235 386 398 425
208 175 207 293
267 358 698 771
183 193 631 403
245 268 572 1383
324 507 556 504
148 378 342 796
130 142 150 173
119 171 333 1062
102 94 93 69
279 204 229 299
244 365 392 406
196 168 199 287
279 358 822 671
167 183 731 203
345 238 572 1652
524 507 520 504
")
EMG = as.matrix(EMG)
### end copying here and paste into the R Console
We can get their column means easily enough. The standard deviations are a
little harder (but not much).
> colMeans(EMG) # apply(EMG, 2, FUN=mean) is the equivalent
LB1 LB2 LB3 LB4
217.6364 260.7273 394.3636 559.1364
> apply(EMG, 2, FUN=sd)
LB1 LB2 LB3 LB4
99.6463 121.5241 213.5919 413.4777
To duplicate their specific comparisons, we need a little bit different contrast
matrix this time.
> cons = cbind(c(-1,1,0,0),c(-1,0,1,0),c(-1,0,0,1))
> cons
[,1] [,2] [,3]
[1,] -1 -1 -1
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 0 1
This will result in the comparison of the first column to the second, the first
column to the third, and the first column to the fourth. (It's the same old
"sums" matrix but reversed and turned upside down.) Now we'll create the D
matrix and do the multivariate analysis.
> Demg = EMG %*% cons
> results = lm(Demg ~ 1)
> anova(results)
Analysis of Variance Table
Df Pillai approx F num Df den Df Pr(>F)
(Intercept) 1 0.54745 7.6614 3 19 0.001486 **
Residuals 21
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have duplicated the F-value, degrees of freedom, and p-value in their
Table 2. Their multivariate statistic is something called TSQ. (Beats me!
Anybody?) We have a Pillai's trace. Their specific comparisons are in Table 3
and consist of F-tests. Ours will be t-tests, but note the p-values are the
same, and F = t2 in all cases.
> summary(results)
Response Y1 :
Call:
lm(formula = Y1 ~ 1) ### test on column 1 of Demg
Residuals:
Min 1Q Median 3Q Max
-150.09 -57.84 -28.59 44.91 186.91
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.09 19.65 2.193 0.0397 *
---
Residual standard error: 92.18 on 21 degrees of freedom
Response Y2 :
Call:
lm(formula = Y2 ~ 1) ### test on column 2 of Demg
Residuals:
Min 1Q Median 3Q Max
-226.73 -170.98 1.77 66.52 387.27
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 176.73 40.66 4.346 0.000284 ***
---
Residual standard error: 190.7 on 21 degrees of freedom
Response Y3 :
Call:
lm(formula = Y3 ~ 1) ### test on column 3 of Demg
Residuals:
Min 1Q Median 3Q Max
-374.5 -303.8 -170.5 256.2 965.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 341.50 86.26 3.959 0.000717 ***
---
Residual standard error: 404.6 on 21 degrees of freedom
So the contrast matrix doesn't have to be exactly the one generated by
contr.sum(). In fact, it can be any contrast
matrix in which the contrast vectors are linearly independent. The columns of
D don't even have to be simple difference scores. R has a function
contr.helmert() that generates Helmert contrasts,
and those will work, as Helmert contrasts are linearly independent. ("Linearly
independent" means that any one contrast vector cannot be derivable by a
linear combination of the others. It's not the same thing as orthogonality.)
- Vasey, M. W. and Thayer, J. F. (1987). The continuing problem of false
positives in repeated measures ANOVA in psychophysiology: A multivariate
solution. Psychophysiology, 24(4), 479-486.
Unbalanced Designs
Don't do it! If, for example, StoreC didn't have any lettuce and didn't have
a price on the shelf for it (if that value is missing), then all of the data for
that "subject" (lettuce) should be removed from the analysis. There are ways
around this, but not if you're going to use repeated measures ANOVA.
revised 2016 February 9
| Table of Contents
| Function Reference
| Function Finder
| R Project |
|