R Tutorials--Single Factor Repeated Measures ANOVA

SINGLE FACTOR REPEATED MEASURES ANOVA

Preliminaries

Model Formulae

If you have not yet read the Model Formulae tutorial, now would definitely be the time to do it.

Some Explanations

The naming of experimental designs has led to a great deal of confusion on the part of students as well as tutorial writers. The design I will discuss in this tutorial is the single factor within subjects design, also called the single factor repeated measures design. This is a design with one response variable, and each "subject" or "experimental unit" is measured multiple times on that response variable. For example, pre-post designs, in which a group of subjects is measured both before and after some treatment, fit into this category. This design is also referred to in some sources as a "treatment × subjects" design (read "treatment by subjects"). You can also think of the design as having treatments nested within subjects. Or you could also think of it as a block design in which subjects correspond to blocks. I've even heard it called split plot, in which subjects are plots. You see my point.

There is some difference of opinion concerning the expressions "repeated measures design" and "within subjects design." Some people reserve the use of "repeated measures" for cases in which subjects ("units") are measured on the same variable over time, such as in pre-post designs, and use "within subjects" for cases in which subjects are measured in multiple experimental conditions, e.g., reaction time to a red light, reaction time to a green light, reaction time to a white light, etc. I have a tendency not to obey this particular convention and use "repeated measures" and "within subjects" interchangeably. In any event, the analysis is the same.

In my field (psychology), we are used to referring to our experimental units as "subjects," because they are almost always either human beings or animals. I will retain that terminology in this tutorial. This is why, in the first study discussed below, grocery items are referred to as subjects--because the groceries correspond to what would ordinarily be human subjects in a typical psychology experiment. For a change, it is people in other fields who will have to adjust to strange terminology!

Single Factor Designs With Repeated Measures

Several (seventeen!) years ago a friend and I decided to compare prices in local grocery stores. We made a list of ten representative grocery items and then went to four local stores and recorded the price on each item in each store. The most convenient and logical way to table the data--and the first way that comes to mind--is as follows (numbers are prices in dollars).

subject            storeA  storeB  storeC  storeD
lettuce              1.17    1.78    1.29    1.29
potatoes             1.77    1.98    1.99    1.99
milk                 1.49    1.69    1.79    1.59
eggs                 0.65    0.99    0.69    1.09
bread                1.58    1.70    1.89    1.89
cereal               3.13    3.15    2.99    3.09
ground.beef          2.09    1.88    2.09    2.49
tomato.soup          0.62    0.65    0.65    0.69
laundry.detergent    5.89    5.99    5.99    6.99
aspirin              4.46    4.84    4.99    5.15

(IMPORTANT NOTES: For those of you who think this is a treatment-by-blocks design, it is! When I get to the tutorial on blocking, I will draw the parallel between the two designs. For now, just pretend groceries are "subjects." Also, please excuse for the moment the absence of nicities in this table, like stubs and spanners, because shortly you will be happy I did the table in this bare bones way!)

First, I remember when a can of tomato soup was 12¢, but that's another story. We should notice laundry detergent costs a lot more than soup. This isn't interesting to us. In other words, there is a great deal of variability--perhaps 99% or more of the total variability--attributable entirely to individual differences between subjects, in which we are not the least bit interested. If we treated this as a straightforward one-way randomized ANOVA, all of this variability would go into the error term, and any variability due to "store" would be totally swamped. The advantage of the repeated measures analysis is that it allows us to parcel out variability due to subjects.

Second, we are now faced with getting these data into R. Try this.

> groceries = read.table(header=T, row.names=1, text="
+ subject            storeA  storeB  storeC  storeD     ####
+ lettuce              1.17    1.78    1.29    1.29        #
+ potatoes             1.77    1.98    1.99    1.99        #
+ milk                 1.49    1.69    1.79    1.59        #
+ eggs                 0.65    0.99    0.69    1.09        #
+ bread                1.58    1.70    1.89    1.89        ### you can copy and paste this
+ cereal               3.13    3.15    2.99    3.09        #   part from the table above
+ ground.beef          2.09    1.88    2.09    2.49        #
+ tomato.soup          0.62    0.65    0.65    0.69        #
+ laundry.detergent    5.89    5.99    5.99    6.99        #
+ aspirin              4.46    4.84    4.99    5.15     ####
+ ")
> ### apparently you are not allowed to insert comments in this fashion (so
> ### much for what comes after a # aways being ignored!), so copy and paste
> ### the table part from above
>
> str(groceries)
'data.frame':	10 obs. of  4 variables:
 $ storeA: num  1.17 1.77 1.49 0.65 1.58 3.13 2.09 0.62 5.89 4.46
 $ storeB: num  1.78 1.98 1.69 0.99 1.7 3.15 1.88 0.65 5.99 4.84
 $ storeC: num  1.29 1.99 1.79 0.69 1.89 2.99 2.09 0.65 5.99 4.99
 $ storeD: num  1.29 1.99 1.59 1.09 1.89 3.09 2.49 0.69 6.99 5.15
>
> summary(groceries)
     storeA          storeB          storeC          storeD     
 Min.   :0.620   Min.   :0.650   Min.   :0.650   Min.   :0.690  
 1st Qu.:1.250   1st Qu.:1.692   1st Qu.:1.415   1st Qu.:1.365  
 Median :1.675   Median :1.830   Median :1.940   Median :1.940  
 Mean   :2.285   Mean   :2.465   Mean   :2.436   Mean   :2.626  
 3rd Qu.:2.870   3rd Qu.:2.857   3rd Qu.:2.765   3rd Qu.:2.940  
 Max.   :5.890   Max.   :5.990   Max.   :5.990   Max.   :6.990

If that didn't work, try copying and pasting this script into your R Console.

### start copying with this line
groceries = data.frame(
     c(1.17,1.77,1.49,0.65,1.58,3.13,2.09,0.62,5.89,4.46),
     c(1.78,1.98,1.69,0.99,1.70,3.15,1.88,0.65,5.99,4.84),
     c(1.29,1.99,1.79,0.69,1.89,2.99,2.09,0.65,5.99,4.99),
     c(1.29,1.99,1.59,1.09,1.89,3.09,2.49,0.69,6.99,5.15))
rownames(groceries) = c("lettuce","potatoes","milk","eggs","bread","cereal",
     "ground.beef","tomato.soup","laundry.detergent","aspirin")
colnames(groceries) = c("storeA","storeB","storeC","storeD")
### end copying with this line and paste

And if that didn't do it, well, fire up scan() and start typing!

This is a wide format data frame. It's not going to work for us. Why not? Because our response variable--our ONE response variable--is the price of the items, and this is listed in four columns. We need each variable in ONE column of the data frame in order to use the aov() function. This sort of problem occurs so often, you'd think some nice R coder would have written a utility to rearrange such tables.

> stack(groceries)                          # Funny! :)

I have not shown the output, but this is ALMOST what we want. It would have worked beautifully if this were a between subjects design. Here it is but a start. We need a subject identifier. Fortunately, we put the grocery names into the row names of the short data frame.

> gr2 = stack(groceries)                              # I'm tired of typing "groceries"!
> gr2$subject = rep(rownames(groceries), 4)           # create the "subject" variable
> gr2$subject = factor(gr2$subject)                   # "I declare thee a factor."
> colnames(gr2) = c("price", "store", "subject")      # rename the columns
> gr2                                                 # take a look
   price  store           subject
1   1.17 storeA           lettuce
2   1.77 storeA          potatoes
3   1.49 storeA              milk
4   0.65 storeA              eggs
5   1.58 storeA             bread
6   3.13 storeA            cereal
7   2.09 storeA       ground.beef
8   0.62 storeA       tomato.soup
9   5.89 storeA laundry.detergent
10  4.46 storeA           aspirin
11  1.78 storeB           lettuce
12  1.98 storeB          potatoes
13  1.69 storeB              milk
14  0.99 storeB              eggs
15  1.70 storeB             bread
16  3.15 storeB            cereal
17  1.88 storeB       ground.beef
18  0.65 storeB       tomato.soup
19  5.99 storeB laundry.detergent
20  4.84 storeB           aspirin
21  1.29 storeC           lettuce
22  1.99 storeC          potatoes
23  1.79 storeC              milk
24  0.69 storeC              eggs
25  1.89 storeC             bread
26  2.99 storeC            cereal
27  2.09 storeC       ground.beef
28  0.65 storeC       tomato.soup
29  5.99 storeC laundry.detergent
30  4.99 storeC           aspirin
31  1.29 storeD           lettuce
32  1.99 storeD          potatoes
33  1.59 storeD              milk
34  1.09 storeD              eggs
35  1.89 storeD             bread
36  3.09 storeD            cereal
37  2.49 storeD       ground.beef
38  0.69 storeD       tomato.soup
39  6.99 storeD laundry.detergent
40  5.15 storeD           aspirin

NOW we have a proper long format data frame for this analysis. This is what we need for a repeated measures analysis using the aov() function. We need a long format data frame WITH a subjects identifier, which must be a factor. If you numbered your subjects, and your subjects identifier is now a column of numbers, BE SURE to declare it to be a factor. You don't want R seeing a numeric variable here!

At which store should we shop?

> with(gr2, tapply(price, store, sum))
storeA storeB storeC storeD 
 22.85  24.65  24.36  26.26

For the items in the sample, it looks like storeA had the lowest prices. But as this is only a random(ish) sample of items we might have wanted to buy, we have to ask, will this difference hold up in general?

Once the data frame is set up correctly (all values of the response variable in ONE column, and another column holding the subject variable--i.e., a variable that identifies which values of the response go with which subjects), the trick to doing a repeated measures ANOVA is in getting the model formula right. We need to supply an Error term, which must reflect that we have "treatments nested within subjects." That is to say, in principle at least, we can see the "store" effect within each and every "subject" (grocery item). The ANOVA is properly done this way.

> aov.out = aov(price ~ store + Error(subject/store), data=gr2)
> summary(aov.out)

Error: subject
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  9 115.193  12.799

Error: subject:store
          Df  Sum Sq Mean Sq F value  Pr(>F)  
store      3 0.58586 0.19529  4.3442 0.01273 *
Residuals 27 1.21374 0.04495                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

"Store" and "subject" are our sources of variability. The treatment we are interested in is "store" (that's what we want to see the effect of), and this treatment effect is visible within each subject (i.e., nested within each subject). So the proper Error term is "subject/store", which is read as "store within subject." Notice that once all that "subject" variability is parceled out, we do have a significant "store" main effect. (Note: Error(subject) will work just as well for the error term. I'll discuss this further in a later tutorial.)

Post Hoc Tests

The Tukey HSD test will not run on this model object, so I propose pairwise t-tests with adjusted p-values to see which stores are significantly different. (Note: The output will be a table of p-values for each possible pairwise comparison.)

> ### no p-adjustment
> with(gr2, pairwise.t.test(x=price, g=store, p.adjust.method="none", paired=T))

	Pairwise comparisons using paired t tests 

data:  price and store 

       storeA storeB storeC
storeB 0.033  -      -     
storeC 0.035  0.695  -     
storeD 0.012  0.245  0.110 

P value adjustment method: none

> ### Bonferroni adjustment
> with(gr2, pairwise.t.test(x=price, g=store, p.adjust.method="bonf", paired=T))

	Pairwise comparisons using paired t tests 

data:  price and store 

       storeA storeB storeC
storeB 0.20   -      -     
storeC 0.21   1.00   -     
storeD 0.07   1.00   0.66  

P value adjustment method: bonferroni

The syntax for this function is discussed in the Multiple Comparisons tutorial. The new element here is the paired=T option, which is set because these t-tests are on paired scores.

The method of contrasts described in the Multiple Comparisons tutorial will also work. See that tutorial for details.

> cons = cbind(c(-1,1/3,1/3,1/3), c(0,-1/2,-1/2,1), c(0,1,-1,0))     # define the contrasts
> t(cons) %*% cons                                                   # test for orthogonality
         [,1] [,2] [,3]
[1,] 1.333333  0.0    0
[2,] 0.000000  1.5    0
[3,] 0.000000  0.0    2
> contrasts(gr2$store) = cons                                        # assign contrasts to treatment
> aov.out = aov(price ~ store + Error(subject/store), data=gr2)      # do the ANOVA
> summary(aov.out, split=list(
+    store=list("A vs BCD"=1,"BC vs D"=2,"B vs C"=3)
+ ))                                                                 # visualize the results

Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  115.2    12.8               

Error: subject:store
                  Df Sum Sq Mean Sq F value  Pr(>F)   
store              3 0.5859  0.1953   4.344 0.01273 * 
  store: A vs BCD  1 0.3763  0.3763   8.371 0.00745 **
  store: BC vs D   1 0.2053  0.2053   4.568 0.04179 * 
  store: B vs C    1 0.0042  0.0042   0.094 0.76207   
Residuals         27 1.2137  0.0450                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that the result for B vs C is not quite the same as the t-test on that comparison above with no p-value adjustment. That's because in the paired pairwise t-tests, the overall pooled variance (MSE) is not used.

Treatment by Subjects

I want to point out, in preparation for a later tutorial, that this can also be treated as a treatment-by-subjects design. That is, it can essentially be treated as a two-factor design without replication and without an interaction.

> aov.tbys = aov(price ~ store + subject, data=gr2)
> summary(aov.tbys)
            Df Sum Sq Mean Sq F value Pr(>F)    
store        3   0.59   0.195   4.344 0.0127 *  
subject      9 115.19  12.799 284.722 <2e-16 ***
Residuals   27   1.21   0.045                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The advantage of running the ANOVA this way is that the Tukey test will work, AND it will give the right answers for repeated measures data.

> TukeyHSD(aov.tbys, which="store")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = price ~ store + subject, data = gr2)

$store
                diff         lwr       upr     p adj
storeB-storeA  0.180 -0.07947858 0.4394786 0.2524605
storeC-storeA  0.151 -0.10847858 0.4104786 0.3995675
storeD-storeA  0.341  0.08152142 0.6004786 0.0065761
storeC-storeB -0.029 -0.28847858 0.2304786 0.9898432
storeD-storeB  0.161 -0.09847858 0.4204786 0.3442902
storeD-storeC  0.190 -0.06947858 0.4494786 0.2115536

The Friedman Rank Sum Test

There is a nonparametric version of the oneway ANOVA with repeated measures. It is executed this way.

> friedman.test(price ~ store | subject, data=gr2)

        Friedman rank sum test

data:  price and store and subject 
Friedman chi-squared = 13.3723, df = 3, p-value = 0.003897

The formula reads "price as a function of store given subject". Here, "subject" is being treated as a blocking variable. (No, aov() will not take the formula in this format.)

An advantage of using this test is that it will work with the short format data frame, provided we are careful to declare the relevant columns to be a matrix. Here, all of the columns in the data frame contain relevant data, so declaring the matrix is easy.

> friedman.test(as.matrix(groceries))

	Friedman rank sum test

data:  as.matrix(groceries)
Friedman chi-squared = 13.3723, df = 3, p-value = 0.003897

For another interesting example of the Friedman test, run example(friedman.test). And note: In spite of what the output says, it's not 18 players, it's 22 (n=22).

The Multivariate Approach to One-way Repeated Measures Designs

(Note: You might want to read about contrast matrices in the Multiple Comparisons tutorial before reading this section, if you're unsure what a contrast matrix is. And while I'm at it, a bit of a rant: Just try to find an easy, well explained example of this! Most people seem to be a lot more interested in showing off how smart they are than they are concerned with explaining things clearly. The advantage you have with me is, I'm not that smart to begin with!)

A few decades or so ago, there was a lot of talk of an alternative approach to repeated measures analysis using multivariate ANOVA. The traditional repeated measures ANOVA makes some pretty stiff assumptions and is not at all robust to their violation. The multivariate approach relaxes those assumptions somewhat, but also costs degrees of freedom (so you can't have everything). You could go to alternative packages, such as nlme or lme4, to implement this approach, but it's actually fairly easy to do using the base packages (for the oneway design). AND, extra added bonus, it allows you to keep your data in the more sensible short format. In fact, it insists upon it!

We need a response matrix, which we can get from the short format data frame.

> groceries
                  storeA storeB storeC storeD
lettuce             1.17   1.78   1.29   1.29
potatoes            1.77   1.98   1.99   1.99
milk                1.49   1.69   1.79   1.59
eggs                0.65   0.99   0.69   1.09
bread               1.58   1.70   1.89   1.89
cereal              3.13   3.15   2.99   3.09
ground.beef         2.09   1.88   2.09   2.49
tomato.soup         0.62   0.65   0.65   0.69
laundry.detergent   5.89   5.99   5.99   6.99
aspirin             4.46   4.84   4.99   5.15
> gromat = as.matrix(groceries[,1:4])
> gromat
                  storeA storeB storeC storeD
lettuce             1.17   1.78   1.29   1.29
potatoes            1.77   1.98   1.99   1.99
milk                1.49   1.69   1.79   1.59
eggs                0.65   0.99   0.69   1.09
bread               1.58   1.70   1.89   1.89
cereal              3.13   3.15   2.99   3.09
ground.beef         2.09   1.88   2.09   2.49
tomato.soup         0.62   0.65   0.65   0.69
laundry.detergent   5.89   5.99   5.99   6.99
aspirin             4.46   4.84   4.99   5.15

Notice that it looks exactly the same in this case, but that won't always be true. We extracted all rows (the blank index) and columns 1 through 4 (which just happens in this case to be all columns) of the groceries data frame and declared the result as a matrix.

Allow me to begin by demonstating this method by doing a Related Measures t-Test. We'll compare storeA to storeD. As the related measures t-test is essentially a Single Sample t-Test on difference scores, I'll begin by getting difference scores and putting them into a single-column matrix called D. Then I'll run the command that does the test and compare the result to the t-test.

> D = gromat[,4] - gromat[,1]          # the difference scores
> D = matrix(D)                        # into a matrix
> D
       [,1]
 [1,]  0.12
 [2,]  0.22
 [3,]  0.10
 [4,]  0.44
 [5,]  0.31
 [6,] -0.04
 [7,]  0.40
 [8,]  0.07
 [9,]  1.10
[10,]  0.69
> result = lm(D ~ 1)                   # regress D against the intercept (D tilde one)
> summary(result)$coef                 # summary(result) is sufficient
            Estimate Std. Error  t value   Pr(>|t|)
(Intercept)    0.341  0.1081301 3.153609 0.01166957
>
> t.test(D, mu=0)                      # for comparison

	One Sample t-test

data:  D
t = 3.1536, df = 9, p-value = 0.01167
... some output omitted ...

Now I point out that there are six such pairwise comparisons that are possible, and we picked the most favorable one by looking at the data first. A fairer result might be had if we applied a Bonferroni correction to the p-value.

> 0.01167 * 6
[1] 0.07002

It remains to generalize this method to larger D matrices, i.e., D matrices with more columns. For comparison, I'll recall the result of the one-way ANOVA we did above: F(3,27) = 4.344, p = .0127. I'll also point out that we have a moderate violation of the sphericity assumption in these data (the effects are not quite additive), and so I calculated a Greenhouse-Geisser correction to the p-value, resulting in p = .0309, still low enough to reject the null at an alpha of .05, but getting close.

What we need are k-1 columns of difference scores in our D matrix, where k is the number of levels of the repeated measures factor, or 4-1=3 in this case. R provides us with a convenient way of getting those using a function called contr.sum(). This function creates a contrast matrix of "sum to zero" contrasts, or "effects" contrasts. We have k=4 levels of the repeated measures factor, so we need this matrix:

> contr.sum(4)
  [,1] [,2] [,3]
1    1    0    0
2    0    1    0
3    0    0    1
4   -1   -1   -1

Using that contrast matrix, we will calculate the following differences: A-D, B-D, and C-D. It's done in one step via matrix multiplication.

> D = gromat %*% contr.sum(4)          # percent asterisk percent is matrix multiplication
> D
                   [,1]  [,2]  [,3]
lettuce           -0.12  0.49  0.00
potatoes          -0.22 -0.01  0.00
milk              -0.10  0.10  0.20
eggs              -0.44 -0.10 -0.40
bread             -0.31 -0.19  0.00
cereal             0.04  0.06 -0.10
ground.beef       -0.40 -0.61 -0.40
tomato.soup       -0.07 -0.04 -0.04
laundry.detergent -1.10 -1.00 -1.00
aspirin           -0.69 -0.31 -0.16

It remains to run the multivariate test.

> result = lm(D ~ 1)
> anova(result)
Analysis of Variance Table

            Df  Pillai approx F num Df den Df  Pr(>F)  
(Intercept)  1 0.64746   4.2853      3      7 0.05156 .
Residuals    9                                         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Our test statistic is called Pillai's trace. An approximate F has also been calculated. Notice it is similar to the one from the repeated measures ANOVA but is on 3 and 7 degrees of freedom. Hence, p > .05, and the null--all treatment means equal, or more correctly, all levels of the treatment sampled from populations with equal means--is not rejected at an alpha level of .05. The most commonly used test statistic here is probably Wilks' lambda, and you can have that, too, by setting an option.

> anova(result, test="Wilks")
Analysis of Variance Table

            Df   Wilks approx F num Df den Df  Pr(>F)  
(Intercept)  1 0.35254   4.2853      3      7 0.05156 .
Residuals    9                                         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Other statistics available are Hotelling-Lawley trace (test="Hotelling") and Roy's largest root (test="Roy"). People argue over which of these is best in a multivariate ANOVA, but notice here it makes no difference. That p-value just ain't gonna budge, no matter which one you use.

"Why would I want to do that?" you ask. After all, we lost our significant effect. Notice that the Greenhouse-Geisser corrected p-value and the MANOVA p are reasonably close but just happen to be on opposite sides of .05. It sometimes happens in the MANOVA approach that the degrees of freedom are reduced sufficiently to cost us a significant effect due to reduction in the power of the test. That's especially true when n (the number of subjects) is not much greater than k (the number of levels of the treatment), as is the case here. These issues were discussed by Vasey and Thayer (1987).

As a second example of the multivariate approach, I'd like to duplicate the analysis in their paper (just about). In this example, we have sufficient subjects to make the two approaches about equal in power. The data appear in their paper in Table 1, p. 484. The data are "measures of mean electromyographic (EMG) amplitude [in microvolts] from the left brow region, collected over four 90-s trials from 22 subjects. These data were drawn from a study of affective facial expressions conducted by" Thayer (p. 484). Get ready to type. Or try a copy and paste.

### start copying here
EMG = read.table(header=T, text="
LB1 LB2 LB3  LB4
143 368 345  772
142 155 161  178
109 167 356  956
123 135 137  187
276 216 232  307
235 386 398  425
208 175 207  293
267 358 698  771
183 193 631  403
245 268 572 1383
324 507 556  504
148 378 342  796
130 142 150  173
119 171 333 1062
102  94  93   69
279 204 229  299
244 365 392  406
196 168 199  287
279 358 822  671
167 183 731  203
345 238 572 1652
524 507 520  504
")
EMG = as.matrix(EMG)
### end copying here and paste into the R Console

We can get their column means easily enough. The standard deviations are a little harder (but not much).

> colMeans(EMG)                        # apply(EMG, 2, FUN=mean) is the equivalent
     LB1      LB2      LB3      LB4 
217.6364 260.7273 394.3636 559.1364 
> apply(EMG, 2, FUN=sd)
     LB1      LB2      LB3      LB4 
 99.6463 121.5241 213.5919 413.4777

To duplicate their specific comparisons, we need a little bit different contrast matrix this time.

> cons = cbind(c(-1,1,0,0),c(-1,0,1,0),c(-1,0,0,1))
> cons
     [,1] [,2] [,3]
[1,]   -1   -1   -1
[2,]    1    0    0
[3,]    0    1    0
[4,]    0    0    1

This will result in the comparison of the first column to the second, the first column to the third, and the first column to the fourth. (It's the same old "sums" matrix but reversed and turned upside down.) Now we'll create the D matrix and do the multivariate analysis.

> Demg = EMG %*% cons
> results = lm(Demg ~ 1)
> anova(results)
Analysis of Variance Table

            Df  Pillai approx F num Df den Df   Pr(>F)   
(Intercept)  1 0.54745   7.6614      3     19 0.001486 **
Residuals   21                                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We have duplicated the F-value, degrees of freedom, and p-value in their Table 2. Their multivariate statistic is something called TSQ. (Beats me! Anybody?) We have a Pillai's trace. Their specific comparisons are in Table 3 and consist of F-tests. Ours will be t-tests, but note the p-values are the same, and F = t² in all cases.

> summary(results)
Response Y1 :

Call:
lm(formula = Y1 ~ 1)                   ### test on column 1 of Demg

Residuals:
    Min      1Q  Median      3Q     Max 
-150.09  -57.84  -28.59   44.91  186.91 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    43.09      19.65   2.193   0.0397 *
---

Residual standard error: 92.18 on 21 degrees of freedom


Response Y2 :

Call:
lm(formula = Y2 ~ 1)                   ### test on column 2 of Demg

Residuals:
    Min      1Q  Median      3Q     Max 
-226.73 -170.98    1.77   66.52  387.27 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   176.73      40.66   4.346 0.000284 ***
---

Residual standard error: 190.7 on 21 degrees of freedom


Response Y3 :

Call:
lm(formula = Y3 ~ 1)                   ### test on column 3 of Demg

Residuals:
   Min     1Q Median     3Q    Max 
-374.5 -303.8 -170.5  256.2  965.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   341.50      86.26   3.959 0.000717 ***
---

Residual standard error: 404.6 on 21 degrees of freedom

So the contrast matrix doesn't have to be exactly the one generated by contr.sum(). In fact, it can be any contrast matrix in which the contrast vectors are linearly independent. The columns of D don't even have to be simple difference scores. R has a function contr.helmert() that generates Helmert contrasts, and those will work, as Helmert contrasts are linearly independent. ("Linearly independent" means that any one contrast vector cannot be derivable by a linear combination of the others. It's not the same thing as orthogonality.)

Vasey, M. W. and Thayer, J. F. (1987). The continuing problem of false positives in repeated measures ANOVA in psychophysiology: A multivariate solution. Psychophysiology, 24(4), 479-486.

Unbalanced Designs

Don't do it! If, for example, StoreC didn't have any lettuce and didn't have a price on the shelf for it (if that value is missing), then all of the data for that "subject" (lettuce) should be removed from the analysis. There are ways around this, but not if you're going to use repeated measures ANOVA.

revised 2016 February 9