R Tutorials by William B. King, Ph.D.
| Table of Contents | Function Reference | Function Finder | R Project |


Block Designs

Recall the Related Measures t-Test tutorial, in which we looked at this example.

schizophrenia = read.table(header=T, text="
pair   affected   unaffected
1      1.27       1.94
2      1.63       1.44
3      1.47       1.56
4      1.39       1.58
5      1.93       2.06
6      1.26       1.66
7      1.71       1.75
8      1.67       1.77
9      1.28       1.78
10     1.85       1.92
11     1.02       1.25
12     1.34       1.93
13     2.02       2.04
14     1.59       1.62
15     1.97       2.08
The data are from monozygotic twins discordant for schizophrenia, and the response measure is left hippocampal size in cubic centimeters. The "pairs" are the twin pairs. Here we considered monozygosity to be a matching variable. We could just as easily, and correctly, have called it a blocking variable.

Recall the Single Factor Repeated Measures tutorial, in which we looked at this example.

groceries = read.table(header=T, row.names=1, text="
subject            storeA  storeB  storeC  storeD
lettuce              1.17    1.78    1.29    1.29
potatoes             1.77    1.98    1.99    1.99
milk                 1.49    1.69    1.79    1.59
eggs                 0.65    0.99    0.69    1.09
bread                1.58    1.70    1.89    1.89
cereal               3.13    3.15    2.99    3.09
ground.beef          2.09    1.88    2.09    2.49
tomato.soup          0.62    0.65    0.65    0.69
laundry.detergent    5.89    5.99    5.99    6.99
aspirin              4.46    4.84    4.99    5.15
The data are from four grocery stores near my home, and the response is price of a grocery item in dollars. I called this a repeated measures design in that tutorial, but it might more correctly be called a matched groups design. I called the grocery items subjects, but they might (would!) more appropriately be called blocks. The analysis would be the same no matter how we fiddle with the vocabulary.

To refresh your memory, the analysis was done this way.

> groc = stack(groceries)              # we need a data frame in long format
> head(groc)
  values    ind
1   1.17 storeA
2   1.77 storeA
3   1.49 storeA
4   0.65 storeA
5   1.58 storeA
6   3.13 storeA
> colnames(groc) = c("price","store")
> groc$block = rep(rownames(groceries), times=4)      # we need a blocks identifier...
> groc$block = factor(groc$block)                     # which is declared a factor
> head(groc)
  price  store    block
1  1.17 storeA  lettuce
2  1.77 storeA potatoes
3  1.49 storeA     milk
4  0.65 storeA     eggs
5  1.58 storeA    bread
6  3.13 storeA   cereal
> ### question: why can't we use grocery items as row names anymore? ###
> aov.out = aov(price ~ store + Error(block/store), data=groc)  # store within blocks
> summary(aov.out)

Error: block
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  115.2    12.8               

Error: block:store
          Df Sum Sq Mean Sq F value Pr(>F)  
store      3 0.5859 0.19529   4.344 0.0127 *
Residuals 27 1.2137 0.04495                 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Similarly, we could do it this way.
> aov.out = aov(price ~ block + store, data=groc)    # treatment by blocks approach
> summary(aov.out)
            Df Sum Sq Mean Sq F value Pr(>F)    
block        9 115.19  12.799 284.722 <2e-16 ***
store        3   0.59   0.195   4.344 0.0127 *  
Residuals   27   1.21   0.045                   
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the same tutorial, we considered the following example, which is a true repeated measures design.

EMG = read.table(header=T, text="
subj LB1 LB2 LB3  LB4
S1   143 368 345  772
S2   142 155 161  178
S3   109 167 356  956
S4   123 135 137  187
S5   276 216 232  307
S6   235 386 398  425
S7   208 175 207  293
S8   267 358 698  771
S9   183 193 631  403
S10  245 268 572 1383
S11  324 507 556  504
S12  148 378 342  796
S13  130 142 150  173
S14  119 171 333 1062
S15  102  94  93   69
S16  279 204 229  299
S17  244 365 392  406
S18  196 168 199  287
S19  279 358 822  671
S20  167 183 731  203
S21  345 238 572 1652
S22  524 507 520  504
The data are EMG amplitudes from the left forehead of 22 subjects under different conditions of emotional arousal. Our subj(ects) column could just as easily be called blocks, and the analysis would once again be the same. In a repeated measures (or within subjects) design, the subjects are the blocking variable.

In a treatment by blocks design we have subjects (experimental units) matched on some blocking variable that makes them somehow similar to one another in a way that should influence their response to the experimental treatments. In other words, a treatment by blocks design is what we would call a matched groups design when there are two levels of treatment and we are contemplating a t-test, except there are more than two levels of the treatment, and we are therefore contemplating an ANOVA.

Once the subjects are matched (blocked), if they are then randomly assigned to treatment conditions within blocks--which is not true in any of the above examples--then we have a randomized blocks design. If every block has a subject in every condition, then it's a randomized complete blocks design. If that's not the case, then it's a randomized incomplete blocks design.

An important assumption of treatment by blocks designs is that there is no treatment-by-blocks interaction in the population. Any interaction-like variability in the sample is assumed to be random error and constitutes the error term for the ANOVA. Unless...


What if there is replication (more than one subject) in each of the treatment-by-blocks design cells? This wouldn't happen in a repeated measures design where subjects ARE the blocks, but it could easily happen in a treatment by blocks design where the blocks are something other than subjects. Myers (1972) gives the following exercise in his textbook.

block 11, 216, 103, 1
block 24, 313, 123, 3
block 36, 87, 109, 5
block 410, 87, 75, 12

Data entry is straightforward. (I've always wanted to say something like that!)

### in a script window I typed...
Myers = read.table(header=T, text="
block  treat  score
B1     A1     1
B1     A1     2
B1     A2     16
B1     A2     10
B1     A3     3
B1     A3     1
B2     A1     4
B2     A1     3
B2     A2     13
B2     A2     12
B2     A3     3
B2     A3     3
B3     A1     6
B3     A1     8
B3     A2     7
B3     A2     10
B3     A3     9
B3     A3     5
B4     A1     10
B4     A1     8
B4     A2     7
B4     A2     7
B4     A3     5
B4     A3     12
### if you copy to here, you can paste it into the R Console

 block  treat      score       
 B1:6   A1:8   Min.   : 1.000  
 B2:6   A2:8   1st Qu.: 3.000  
 B3:6   A3:8   Median : 7.000  
 B4:6          Mean   : 6.875  
               3rd Qu.:10.000  
               Max.   :16.000
The trick to getting the analysis right is having R recognize the correct error term. Myers shows that the correct error term is subject variability within block-by-treatment cells. But hold on just a second! Isn't that the same error term we would use if we merely analyzed this as a two-factor factorial design? Yes, it is. Which raises an interesting question. (Okay, it's interesting to me!)

Suppose we have a two-factor design in which one of our factors is a true experimental variable (i.e., we can randomly assign subjects to its levels), but the other factor is quasi-experimental (consists of intact groups). Let's say the quasi-experimental factor is gender. Then don't we have a restriction on randomization? Subject Sally cannot be assigned to the level of gender we would call males, and subject Fred can only be assigned to that level. Doesn't that make gender a blocking variable? Yes it does, but it doesn't matter whether we consider it a blocking variable or a second IV. When we have replication in the cells, the analysis is the same. I suppose we'd be inclined to see it more as a blocking variable if we were not interested in the gender main effect, but only interested in partialling out the variability due to gender to get a more sensitive test on our variable of interest. If we're interested in both the gender main effect and the main effect of the experimental factor, then gender is a second IV.

So the correct analysis is:

> aov.out = aov(score ~ block * treat, data=Myers)
> summary(aov.out)
            Df Sum Sq Mean Sq F value   Pr(>F)    
block        3  25.46    8.49   1.629 0.234657    
treat        2 136.75   68.37  13.128 0.000953 ***
block:treat  6 153.92   25.65   4.925 0.009218 ** 
Residuals   12  62.50    5.21                     
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I am gratified to see that this is the answer "in the back of the book!" Myers further points out that the larger the mean square is for block and block:treat, the more efficient the block design should be over the corresponding single factor design in which only treat is tested. In other words, if gender is unrelated to variability in the DV, then DON'T include it as a blocking variable. It will just end up costing you degrees of freedom in the error term. (For the record, the one-way ANOVA result is F(2,21) = 5.936, p = .009.)

  • Myers, J. L. (1972). Fundamentals of Experimental Design. Boston: Allyn and Bacon.

Split Plot Designs

A split plot design is a block design. Suppose we wish to test five varieties of seed to see how successfully we can grow corn from them. We have a large field in which to do the tests. The field is somewhat heterogeneous in its characteristics, however, so we decide to divide the field into plots with more uniform characteristics. Then each plot is split into five gardens, and one variety of seed is assigned at random to each garden within each plot. This is a randomized complete blocks design. It is analyzed just as any other treatment by blocks design would be (above).

Split plot designs get interesting when we further subdivide our plots. Suppose we have several fields in which to do the tests. We divide each field into homogeneous plots, and each plot into five gardens, etc. Now we have gardens within plots within fields.

A source that is rife with such agricultural examples is Crawley (2005). We have to go to his website to download his data sets. They will download as a file called zipped.zip. Unzip it, rename the folder "Crawley", and drop it into your normal working directory (Rspace). Then retrieve a data file called "splityield" as follows.

> yields = read.table(file="Crawley/splityield.csv", header=T, sep=",")
> summary(yields)
     yield        block      irrigation   density   fertilizer
 Min.   : 60.00   A:18   control  :36   high  :24   N :24     
 1st Qu.: 86.00   B:18   irrigated:36   low   :24   NP:24     
 Median : 95.00   C:18                  medium:24   P :24     
 Mean   : 99.72   D:18                                        
 3rd Qu.:114.00                                               
 Max.   :136.00
The experiment was this. Crop yield was measured in response to three factors, which were irrigation (control, irrigated), sowing density (high, low, medium; which could be releveled if we were of a mind to), and fertilizer application (N, NP, and P). Four fields were used, each of which was split in half and a level of irrigation applied at random to each half. Each half field was split into three parts and a level of density randomly assigned to each one. Finally, each of these subsubfields was split again into three parts and a level of fertilizer assigned at random to each one. Thus, we have fertilizer within density within irrigation, and the whole thing within fields or blocks. Crawley suggests the following analysis.
> aov.out = aov(yield ~ irrigation * density * fertilizer + 
+               Error(block/irrigation/density/fertilizer), data=yields)
> summary(aov.out)

Error: block
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  3  194.4   64.81               

Error: block:irrigation
           Df Sum Sq Mean Sq F value Pr(>F)  
irrigation  1   8278    8278   17.59 0.0247 *
Residuals   3   1412     471                 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: block:irrigation:density
                   Df Sum Sq Mean Sq F value Pr(>F)  
density             2   1758   879.2   3.784 0.0532 .
irrigation:density  2   2747  1373.5   5.912 0.0163 *
Residuals          12   2788   232.3                 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: block:irrigation:density:fertilizer
                              Df Sum Sq Mean Sq F value   Pr(>F)    
fertilizer                     2 1977.4   988.7  11.449 0.000142 ***
irrigation:fertilizer          2  953.4   476.7   5.520 0.008108 ** 
density:fertilizer             4  304.9    76.2   0.883 0.484053    
irrigation:density:fertilizer  4  234.7    58.7   0.680 0.610667    
Residuals                     36 3108.8    86.4                     
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Subsequent analysis is suggested by Crawley in his textbook. (Note: The smallest level of subdivision does not need to be included in the Error term, so in his textbook Crawley gives Error(block/irrigation/density), which is equivalent. Similarly, in the groceries example of the last section, we could have given Error(blocks) as the error term.)

Latin Square Designs

A Latin square design is a block design. Suppose you have a treatment of interest that has three levels. You decide to divide your subjects into three blocks to partial out variability due to a certain nuisance variable. Then you think of a second nuisance variable, and you decide to block on that variable as well (also three blocks). Now you've got a 3x3x3 factorial design, for a total of 27 cells in the design table. Problem is, you don't have that many subjects (experimental units) available.

Latin square is a possible solution. Let's draw the design table for the two blocking factors, which we'll call M and N.
Your treatment has three levels, which we'll call A, B, and C. The trick is to assort those treatment levels to the cells in such a way that no treatment level occurs more than once in each row and each column. That's a Latin square.
There are several possible rearrangements of this particular square, which can be obtained by permuting the rows and columns. One of those should be chosen at random.

We no longer have a full factorial design with 3x3x3=27 design cells. We now have a 3x3 Latin square, and at one subject per cell, we need only 9 total subjects (experimental units). We will not be able to analyze interactions in such a design (see below), so we have to assume they don't exist. Any interaction-like variability will be assumed to be nothing more than random noise and will constitute the error term.

Latin Square
An agricultural experiment with a Latin square structure.

The built-in data set OrchardSprays illustrates an 8x8 Latin square design. Here is the description from the help page.

     An experiment was conducted to assess the potency of various constituents
     of orchard sprays in repelling honeybees, using a Latin square design.
     Individual cells of dry comb were filled with measured amounts of lime
     sulphur emulsion in sucrose solution. Seven different concentrations of
     lime sulphur ranging from a concentration of 1/100 to 1/1,562,500 in
     successive factors of 1/5 were used as well as a solution containing no
     lime sulphur.

     The responses for the different solutions were obtained by releasing 100
     bees into the chamber for two hours, and then measuring the decrease in
     volume of the solutions in the various cells.

     An 8 x 8 Latin square design was used and the treatments were coded as follows:
     A  highest level of lime sulphur
     B  next highest level of lime sulphur
     G  lowest level of lime sulphur
     H  no lime sulphur
I'm not sure exactly why you'd want to repel honeybees from your orchard, but then I'm not an agronomist, so let's just look at the data.
> data(OrchardSprays)
> OS = OrchardSprays              # I'm not typing OrchardSprays any more than necessary!
> summary(OS)
    decrease          rowpos         colpos       treatment 
 Min.   :  2.00   Min.   :1.00   Min.   :1.00   A      : 8  
 1st Qu.: 12.75   1st Qu.:2.75   1st Qu.:2.75   B      : 8  
 Median : 41.00   Median :4.50   Median :4.50   C      : 8  
 Mean   : 45.42   Mean   :4.50   Mean   :4.50   D      : 8  
 3rd Qu.: 72.00   3rd Qu.:6.25   3rd Qu.:6.25   E      : 8  
 Max.   :130.00   Max.   :8.00   Max.   :8.00   F      : 8  
> OS$rowpos = factor(OS$rowpos)   # the row ...
> OS$colpos = factor(OS$colpos)   # and column blocking factors must be seen by R as factors!
> with(OS, table(rowpos,colpos))  # an unreplicated Latin square (n=1 per cell)
rowpos 1 2 3 4 5 6 7 8
     1 1 1 1 1 1 1 1 1
     2 1 1 1 1 1 1 1 1
     3 1 1 1 1 1 1 1 1
     4 1 1 1 1 1 1 1 1
     5 1 1 1 1 1 1 1 1
     6 1 1 1 1 1 1 1 1
     7 1 1 1 1 1 1 1 1
     8 1 1 1 1 1 1 1 1
> matrix(OS$treatment, nrow=8)    # and here is the Latin square
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "D"  "C"  "F"  "H"  "E"  "A"  "B"  "G" 
[2,] "E"  "B"  "H"  "A"  "D"  "C"  "G"  "F" 
[3,] "B"  "H"  "A"  "E"  "G"  "F"  "C"  "D" 
[4,] "H"  "D"  "E"  "C"  "A"  "G"  "F"  "B" 
[5,] "G"  "E"  "D"  "F"  "C"  "B"  "A"  "H" 
[6,] "F"  "A"  "C"  "G"  "B"  "D"  "H"  "E" 
[7,] "C"  "F"  "G"  "B"  "H"  "E"  "D"  "A" 
[8,] "A"  "G"  "B"  "D"  "F"  "H"  "E"  "C"
That last command worked only because the data frame was constructed in such an orderly fashion.
> par(mfrow=c(2,2))
> plot(decrease ~ rowpos + colpos + treatment, data=OS)
Hit <Return> to see next plot:    ### return focus to the Console and press Enter
Orchard Sprays
The ANOVA is a straightforward additive model.
> aov.out = aov(decrease ~ rowpos + colpos + treatment, data=OS)
> summary(aov.out)
            Df Sum Sq Mean Sq F value   Pr(>F)    
rowpos       7   4767     681   1.788    0.115    
colpos       7   2807     401   1.053    0.410    
treatment    7  56160    8023  21.067 7.45e-12 ***
Residuals   42  15995     381                     
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a signicant effect of treatment. A post hoc test could be done with TukeyHSD(aov.out, which= "treatment"). In this case, however, a trend analysis might be particularly appropriate. Notice from the boxplot that there is reason to believe a substantial problem with heterogeniety of variance exists.

Replicated Latin Square Designs

Now things get messy. There seems to be a difference of opinion among authorities as to how these designs should be done as well as how they should be analyzed. Replicated Latin square means you have more than one subject in each of your design cells. But back the bus up here, because we've already overlooked our first complication.

Suppose you have four treatments and two four-level blocking factors. That would require 16 subjects (4 treatment positions times 4 orders). Suppose you have 32 subjects available. What are you going to do with the second 16 of them? One possibility would be to put two subjects in each of your design cells. Thus, you may have choosen this Latin square:
And in each of those little cells you would have two subjects. That's what I'm calling "simple replication." The other possibility would be to choose, from your handbook of experimental designs, a second Latin square and put the second sixteen subjects in that. Thus, you may have THESE TWO Latin squares, which I'll call "replicates":

And in each of those little cells you would have one subject. Another complication here is just how the replicates are produced. If the experiment is done today with 16 subjects, and then tomorrow it's done again, and the next day it's done again, and so on, with the same Latin square, then you've essentially added a third blocking factor, replicates, and the data should be analyzed accordingly. However, it's also possible to have different Latin squares in your replicates, as was the case immediately above. It doesn't seem to matter. In either case the analysis would be:

score ~ row + column + replicate + treatment

It can get more complex, however. (Of course it can!) The following examples were taken from Dr. Iago Hale's site at the University of New Hampshire. We are testing gasoline additive in tractors but, not only is it possible for the Latin square to change from replicate to replicate, it's also possible that we might be using a new set of tractors and a new bunch of drivers.

LS Replicates

(Note: Dr. Hale uses the lm() function and anova() extractor function to get the ANOVA instead of aov() and summary(). The two methods are equivalent.)

I, on the other hand, have not driven a tractor since I was a kid and am more likely to have a bunch of squirming rats, and I'm just going to stick all of my cell(3,2) subjects into cell(3,2) and let them squirm around in there. Thus, I have simple replication but no replicates as such.

Putting the bus in forward gear again, the second issue is whether or not we should look for interactions in the replicated Latin square. One source I consulted said, emphatically, NO, that all interactions in Latin square designs are assumed to be error. Other sources SEEMED to disagree.

In a simple, unreplicated, Latin square, we must ASSUME there are no interactions, because any variability we see interaction-wise is going to be treated as noise and dumped into our error term. In fact, we wouldn't have an error term otherwise. There are not enough degrees of freedom to test an interaction. But what if those interactions are real, i.e., what if they exist in nature? In a Latin square design, the treatment effects are partially confounded with the interactions. Myers (1972) discusses this in some detail and possible remedies. Bottom line is, the existence of interactions between the treatment factors and blocking factors is a serious problem in Latin square designs. An interaction between the blocking factors (rows x columns) is especially serious, because that interaction is confounded with the treatment factor.

Latin square designs are often used to control for various kinds of order effects when there are repeated measures on the treatment of interest. It's often suggested that this "incomplete counterbalancing" is a good way to control for effects caused by order of presentation of the treatment levels. Sometimes it is, sometimes it's not. (I'll cover the analysis of repeated measures Latin squares in a separate section.)

A number of years ago, a student of ours, Paula Pyle, did her senior research on aggressive behavior in children resulting from watching violent cartoons. (The aggressive behaviors, not the children, resulted from watching cartoons.) She had each child watch two cartoons, a Power Rangers cartoon (violent) and a Barney cartoon (nonviolent). After each cartoon, the children engaged in a play session during which Paula counted the number of aggressive behaviors displayed by the child. Half of the children saw the Power Rangers cartoon first, and half saw the Barney cartoon first.

Overall, the children displayed more aggressive behaviors after watching the Power Rangers cartoon than they did after watching the Barney cartoon. However, that wasn't the end of the story by a long shot! The children who saw the Power Rangers cartoon first were more aggressive after seeing the Barney cartoon than were the children who saw the Barney cartoon first, a clear carryover effect.

When your subjects are inanimate objects, as they often are in industrial experiments for example, there are some effects that you probably are not going to see that may nevertheless occur quite frequently when your subjects are human beings (or any other sort of experimental unit with a memory). Some of those "human" effects are going to be seen in interactions.

Example: Suppose seeing the Power Rangers cartoon first causes a carryover effect resulting in increased aggression after seeing the Barney cartoon, but seeing the Barney cartoon first does not carry over to the Power Rangers cartoon. I.e., kids are equally aggressive after the Power Rangers cartoon regardless of whether they saw it first or second. That's an interaction between your rows and your columns right there! On the other hand, suppose seeing Power Rangers first makes kids more aggressive after Barney, and seeing Barney first makes kids less aggressive after Power Rangers. In fact, suppose all kids show equal aggression in the two treatments but more aggression after seeing Power Rangers first than after seeing Barney first. Your overall treatment effect has just averaged to zero over rows of your design table. Nevertheless, there is clearly an effect of watching Power Rangers vs. Barney.

Things get complicated when your experimental units have a memory! For example, a column effect in your Latin square design might be a perfectly legitimate practice effect, or transfer effect, or fatigue effect. A row effect might be a very real carryover effect. And if the effect of A carries over, but the effect of B does not, then you've got a rows by columns interaction. A Latin square is not always the best design choice.

I suppose the moral of this story should be: THINK FIRST, design later! Then THINK AGAIN before you analyze!

The problem with the obsessive blocking that occurs in Latin square designs is that it reduces the degrees of freedom available in the error term. The following diagram (which I found at Dr. Hale's site) illustrates this point clearly as you go from completely randomized (CRD) to randomized complete blocks (RCBD) to Latin square (LS) designs (assuming 5 treatments and 25 subjects).

Losing DFs

Replication is one way to get more degrees of freedom in your error term. So back to the case of my squirming rats stuffed into the design cells of my ONE Latin square, without true time-to-time replicates. This is probably the case that has occurred most often in psychology. Myers gives an analysis of such a design when the treatment factor is tested between subjects. Such a design is not as desirable as the design with true replicates because it is more prone to interactions (says Myers). The real utility of this design is as a counterbalancing control for order effects in a repeated measures design, to which we now move on.

  • Myers, J. L. (1972). Fundamentals of Experimental Design. Boston: Allyn and Bacon.

Latin Square and Repeated Measures: Counterbalancing

A number of years ago, Wes Rowe did, as his senior research project in our department, an experiment in which he studied the effect of background music on a card matching task. The subject was seated at a table on which he or she found an array of cards. The subject's task was to turn over the cards two at a time. If the cards matched, they were removed from the array. If they did not match, they were turned face down again, and the subject continued to turn the cards two at a time. The objective was to match all the cards and remove them from the array as quickly as possible. Each subject did the task three times, once with each of three kinds of background music: no music, classical music, and instrumental rap music.

Thus, we have a repeated measures design. The DV was time to complete the task (all cards matched and removed from the array), in seconds. I should note that Wes used complete counterbalancing to control for order effects, which involved using six orders of treatment, with 8 subjects assigned to each order. (In effect, that means he had two Latin squares, which could be analyzed as replicates, with replication within each replicate. We're going to keep things a little simpler and look at only half of his data.)

I'll give you two options for getting the data. Here is the first. Type the following with excruciating care!

> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/rowe.txt"
> match = read.csv(file)
If that didn't work, then you can resort to copying and pasting the following script.
### begin copying here
match = read.table(header=T, text="
Order Music Time Row Column Subject
CRN    NM  206  R2     C3     S10
CRN    NM  178  R2     C3     S11
CRN    NM  204  R2     C3     S12
CRN    NM  169  R2     C3     S13
CRN    NM  176  R2     C3     S15
CRN    NM  176  R2     C3     S16
CRN    NM  194  R2     C3     S17
CRN    NM  206  R2     C3     S18
NCR    NM  211  R1     C1     S19
NCR    NM  181  R1     C1     S21
NCR    NM  224  R1     C1     S22
NCR    NM  177  R1     C1     S23
NCR    NM  180  R1     C1     S24
NCR    NM  218  R1     C1     S25
NCR    NM  229  R1     C1     S26
NCR    NM  176  R1     C1     S27
RNC    NM  190  R3     C2     S45
RNC    NM  181  R3     C2     S46
RNC    NM  186  R3     C2     S47
RNC    NM  170  R3     C2     S48
RNC    NM  178  R3     C2     S49
RNC    NM  168  R3     C2     S50
RNC    NM  223  R3     C2     S51
RNC    NM  199  R3     C2     S52
CRN    CM  193  R2     C1     S10
CRN    CM  168  R2     C1     S11
CRN    CM  209  R2     C1     S12
CRN    CM  178  R2     C1     S13
CRN    CM  171  R2     C1     S15
CRN    CM  171  R2     C1     S16
CRN    CM  197  R2     C1     S17
CRN    CM  193  R2     C1     S18
NCR    CM  201  R1     C2     S19
NCR    CM  188  R1     C2     S21
NCR    CM  206  R1     C2     S22
NCR    CM  176  R1     C2     S23
NCR    CM  192  R1     C2     S24
NCR    CM  203  R1     C2     S25
NCR    CM  221  R1     C2     S26
NCR    CM  192  R1     C2     S27
RNC    CM  182  R3     C3     S45
RNC    CM  186  R3     C3     S46
RNC    CM  199  R3     C3     S47
RNC    CM  179  R3     C3     S48
RNC    CM  192  R3     C3     S49
RNC    CM  177  R3     C3     S50
RNC    CM  232  R3     C3     S51
RNC    CM  192  R3     C3     S52
CRN   IRM  208  R2     C2     S10
CRN   IRM  163  R2     C2     S11
CRN   IRM  212  R2     C2     S12
CRN   IRM  151  R2     C2     S13
CRN   IRM  168  R2     C2     S15
CRN   IRM  167  R2     C2     S16
CRN   IRM  188  R2     C2     S17
CRN   IRM  185  R2     C2     S18
NCR   IRM  202  R1     C3     S19
NCR   IRM  184  R1     C3     S21
NCR   IRM  200  R1     C3     S22
NCR   IRM  168  R1     C3     S23
NCR   IRM  180  R1     C3     S24
NCR   IRM  161  R1     C3     S25
NCR   IRM  196  R1     C3     S26
NCR   IRM  181  R1     C3     S27
RNC   IRM  166  R3     C1     S45
RNC   IRM  188  R3     C1     S46
RNC   IRM  204  R3     C1     S47
RNC   IRM  177  R3     C1     S48
RNC   IRM  174  R3     C1     S49
RNC   IRM  165  R3     C1     S50
RNC   IRM  226  R3     C1     S51
RNC   IRM  191  R3     C1     S52
### end copying here and paste into the R Console
The first thing we should notice is that the Order variable and the Row variable contain duplicate information, so we can use either one (but not both) in the analysis. From a summary, then, we can see what Latin square was used.
> summary(match)
 Order    Music         Time       Row     Column     Subject  
 CRN:24   CM :24   Min.   :151.0   R1:24   C1:24   S10    : 3  
 NCR:24   IRM:24   1st Qu.:176.0   R2:24   C2:24   S11    : 3  
 RNC:24   NM :24   Median :187.0   R3:24   C3:24   S12    : 3  
                   Mean   :188.9                   S13    : 3  
                   3rd Qu.:201.2                   S15    : 3  
                   Max.   :232.0                   S16    : 3  
Under the summary for Order we see the Latin square. It says each order was used 24 times, but that's because we have three lines per subject in the table (a long format data frame), one for each kind of Music. In fact, there are only eight Subjects per Order, each subject being used at each of three levels of Music.
> with(match, table(Order,Music))
  CRN  8   8  8
  NCR  8   8  8
  RNC  8   8  8
A quick graphical display of the effects can be had as follows.
> par(mfrow=c(2,2))                                   # partition the graphics device
> plot(Time ~ Order + Column + Music, data=match)     # not boxplot(); I make that mistake often!
Hit <Return> to see next plot: 
> with(match, interaction.plot(Column, Order, Time))  # row by column interaction if any
I find that interaction plot disturbing, and disturbing in a disturbingly meaningful way at that. The longest solving time was in the no music condition when that condition was done first. If that same effect shows up in the second Latin square (the advantage of having a replicate!), I'd be quite distressed. Nevertheless, let's proceed with the analysis to see how it is done (on this square).
> aov.out = aov(Time ~ Order + Column + Music + Error(Subject/Music), data=match)
> summary(aov.out)

Error: Subject
          Df Sum Sq Mean Sq F value Pr(>F)
Order      2    977   488.7   0.595  0.561
Residuals 21  17252   821.5               

Error: Subject:Music
          Df Sum Sq Mean Sq F value  Pr(>F)   
Column     2     67    33.5   0.390 0.67935   
Music      2   1046   522.8   6.084 0.00465 **
Residuals 44   3781    85.9                   
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We find a significant effect of Music type. Notice that R realized the Column effect was also within subjects and tested it accordingly. It's educational to compare this result to the single factor repeated measures ANOVA and to the mixed factorial ANOVA with Order as a between subjects factor. I'll leave that to the interested reader.

Here's another interesting (but incorrect) analysis.

> aov.out2 = aov(Time ~ Order * Column + Music + Error(Subject/Music), data=match)
> summary(aov.out2)

Error: Subject
          Df Sum Sq Mean Sq F value Pr(>F)
Order      2    977   488.7   0.595  0.561
Residuals 21  17252   821.5               

Error: Subject:Music
             Df Sum Sq Mean Sq F value  Pr(>F)   
Column        2     67    33.5   0.426 0.65589   
Music         2   1046   522.8   6.645 0.00311 **
Order:Column  2    477   238.4   3.030 0.05897 . 
Residuals    42   3304    78.7                   
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The test on Music is the wrong test, as the Order:Column and Residuals (within) should be combined to do that test. (See the previous table.) However, Myers pointed out that the test on the Order:Column interaction is interesting as an indication of whether the row-by-column interaction is large enough to be distressing, as that is the interaction which is confounded with the treatment effect. Ideally, that F-ratio should be 1.

created 2016 February 11
| Table of Contents | Function Reference | Function Finder | R Project |