MULTIPLE COMPARISONS, CONTRASTS, AND POST HOC TESTS Don't those three things kind of overlap with one another? I mean, aren't post hoc tests also multiple comparisons? Sure they are. I just wanted to make sure I was catching everyone who is interested in one or another of these topics, regardless of what they decide to call it. When it comes to multiple comparisons, R is both rich and poor. The base packages are quite limited (but still offer some powerful choices). However, there are optional packages that can be downloaded from CRAN that will do just about anything any sane person might want. One that I can recommend is a package called "multcomp". In this tutorial, I'm going to stick to what's available in the base packages. Furthermore, in this tutorial I'm going to stick to looking at one main effect, as might result from a one-way ANOVA, and main effects from between groups designs at that. I think it's necessary to cover this material first, before more complex situations can be tackled. We will tackle those in the relevant tutorials on factorial ANOVA and repeated measures designs. A note: For some of the things in this tutorial to work, it is necessary
to have your contrasts for unordered factors set to "treatment contrasts",
which is the R default. You can check that as follows:
> options("contrasts") $contrasts unordered ordered "contr.treatment" "contr.poly" > ### if you don't see "contr.treatment" under unordered, do this > options(contrasts = c("contr.treatment","contr.poly")) Post Hoc Tests Post hoc tests are done after an omnibus ANOVA has been completed, when there has been a significant effect, and now it is desired to see exactly which groups are significantly different. Statisticians thrive on post hoc tests. There are more of these things than you can shake a stick at! Or would even care to shake a stick at!! I'm going to discuss four: the Tukey HSD test, the Fisher LSD test (with a note on Dunnett's test), the Bonferroni test, and the Holm-Bonferroni test. Tukey HSD test The Tukey Honestly Significant Difference test was illustrated in the
Oneway Between Subjects ANOVA tutorial. I'll repeat
that example here. Briefly, we are testing six insect sprays, A through F, and
after we spray garden plots with them, we are counting how many bugs are left.
> aov.out = aov(count ~ spray, data=InsectSprays) > TukeyHSD(aov.out) # just run TukeyHSD() on the model object Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = count ~ spray, data = InsectSprays) $spray diff lwr upr p adj B-A 0.8333333 -3.866075 5.532742 0.9951810 C-A -12.4166667 -17.116075 -7.717258 0.0000000 D-A -9.5833333 -14.282742 -4.883925 0.0000014 E-A -11.0000000 -15.699409 -6.300591 0.0000000 F-A 2.1666667 -2.532742 6.866075 0.7542147 C-B -13.2500000 -17.949409 -8.550591 0.0000000 D-B -10.4166667 -15.116075 -5.717258 0.0000002 E-B -11.8333333 -16.532742 -7.133925 0.0000000 F-B 1.3333333 -3.366075 6.032742 0.9603075 D-C 2.8333333 -1.866075 7.532742 0.4920707 E-C 1.4166667 -3.282742 6.116075 0.9488669 F-C 14.5833333 9.883925 19.282742 0.0000000 E-D -1.4166667 -6.116075 3.282742 0.9488669 F-D 11.7500000 7.050591 16.449409 0.0000000 F-E 13.1666667 8.467258 17.866075 0.0000000The test is of pairwise comparisons, B with A, C with A, etc. For each comparison, the test reports the diff(erence) between the group means, a 95% confidence interval thereupon, and a p-value that has been adjusted by the Tukey method for the number of comparisons being made. The Tukey test is fairly conservative in controlling familywise error rate, and for that reason it does not have to be protected. (I.e., you can do the Tukey HSD test even without a signicant effect in the omnibus ANOVA, although there would rarely be any point. There might be more of a point in doing the Tukey HSD test instead of the omnibus ANOVA.) What you don't get in this output is the actual HSD value. You don't really
need it, unless your homework requires that you calculate it! Which would not
be difficult. It's half the width of any of those CIs. (Notice they are all
the same width.)
> (5.532742-(-3.866075))/2 # HSD value from the B-A result [1] 4.699408If you want a visual representation of the tests, that is easily enough obtained as well. > plot(TukeyHSD(aov.out), las=1) # las=1 just turns the axis labels Also, the Tukey HSD test was originally intended for balanced designs only. The extention to unbalanced designs was developed by Kramer and has been incorporated into the R function. In the case of unbalanced designs, the proper name of the test is the Tukey-Kramer procedure. Fisher LSD test When I start talking about this test, my students immediately perk up, until I tell them that LSD stands for Least Significant Difference. The LSD test uses unadjusted p-values, and therefore it makes no attempt to control familywise error rate. If you use it as a protected test (i.e., only after seeing a significant effect in the omnibus ANOVA), it's good for about three or four comparisons without inflating the error rate. Beyond that, the inflation grows rapidly with increasing numbers of comparisons, if you're concerned about such a thing. It is among the most powerful of post hoc tests, however, and hey! Type II errors are errors too! Fisher LSD p-values for all possible pairwise comparisons can be obtained
this way.
> with(InsectSprays, pairwise.t.test(x=count, g=spray, p.adjust="none")) # x=DV, g=IV Pairwise comparisons using t tests with pooled SD data: count and spray A B C D E B 0.604 - - - - C 7.3e-11 8.5e-12 - - - D 9.8e-08 1.2e-08 0.081 - - E 2.8e-09 3.3e-10 0.379 0.379 - F 0.181 0.408 2.8e-13 4.0e-10 1.1e-11 P value adjustment method: noneThat's all you get though is just the p-values. You can get SOME of the tests this way. > summary.lm(aov.out)$coef # summary.lm(aov.out) is sufficient, but there is more output Estimate Std. Error t value Pr(>|t|) (Intercept) 14.5000000 1.132156 12.8074279 1.470512e-19 sprayB 0.8333333 1.601110 0.5204724 6.044761e-01 sprayC -12.4166667 1.601110 -7.7550382 7.266893e-11 sprayD -9.5833333 1.601110 -5.9854322 9.816910e-08 sprayE -11.0000000 1.601110 -6.8702352 2.753922e-09 sprayF 2.1666667 1.601110 1.3532281 1.805998e-01Those are tests against the "missing level" of the factor. I.e., spray A doesn't appear in this table, so these tests are A-B, A-C, A-D, A-E, and A-F. (This assumes you have your contrasts set to "treatment," which is the R default. Check with options("contrasts") and look at what it says for unordered factors.) Compare the p-values to those above in the first column of the pairwise t-tests. How would we get the other comparisons? One way would be to calculate them by hand, which is not difficult using the information above. The LSD test is done as a t-test, with t equal to the difference in the means of the two groups that are being compared, divided by the standard error of that difference. The standard error is defined this way: Where n1 and n2 are the two group sizes, both 12 in this case, and s is the pooled standard deviation of the treatment groups, which can be obtained as follows. > aov.out Call: aov(formula = count ~ spray, data = InsectSprays) Terms: spray Residuals Sum of Squares 2668.833 1015.167 Deg. of Freedom 5 66 Residual standard error: 3.921902 # this is sqrt(1015.167/66) or sqrt of the MSE term Estimated effects may be unbalanced > 3.921902/sqrt(6) # calculation of se; /sqrt(6) being from sqrt(1/12 + 1/12) [1] 1.60111Notice the result matches that in the summary.lm table above. With a balanced design, n1 and n2 are the same for all comparisons, and s is the same for all comparisons, so the standard error will always be the same. Furthermore, the difference in means between any two groups can be obtained from the "Estimates" in the above table. So let's say we wanted the comparison between spray D and spray E. We would do it this way. > (-9.5833333 - (-11.0)) / 1.601110 # t [1] 0.8848029 > pt(0.8848029, df=66, lower=F) * 2 # two-tailed p-value (drop any sign on t) [1] 0.3794751Compare this to the p-value in the pairwise t-test table. When the design is unbalanced, the same standard error will not be used for
all tests in the summary.lm output. Neither will it be in the pairwise t-tests.
This is because n1 and n2 may vary between comparisons. Let's unbalance the
InsectSprays data, and see what effect that has. We'll
drop 5 cases selected at random.
> InSpr = InsectSprays[-sample(1:72,5),] > summary(InSpr) # your result will be different count spray Min. : 0.000 A:12 1st Qu.: 3.000 B:11 Median : 7.000 C:12 Mean : 9.582 D:12 3rd Qu.:14.500 E: 9 Max. :26.000 F:11 > with(InSpr, tapply(count,spray,mean)) A B C D E F 14.500000 14.818182 2.083333 4.916667 3.555556 17.181818 > aov.unb = aov(count ~ spray, data=InSpr) > summary.lm(aov.unb)$coef Estimate Std. Error t value Pr(>|t|) (Intercept) 14.5000000 1.128571 12.848106 5.262536e-19 sprayB 0.3181818 1.631911 0.194975 8.460605e-01 sprayC -12.4166667 1.596040 -7.779669 1.053451e-10 sprayD -9.5833333 1.596040 -6.004443 1.148894e-07 sprayE -10.9444444 1.723921 -6.348578 3.010487e-08 sprayF 2.6818182 1.631911 1.643361 1.054535e-01 > with(InSpr, pairwise.t.test(x=count, g=spray, p.adjust="none")) Pairwise comparisons using t tests with pooled SD data: count and spray A B C D E B 0.846 - - - - C 1.1e-10 9.6e-11 - - - D 1.1e-07 9.0e-08 0.081 - - E 3.0e-08 2.4e-08 0.396 0.433 - F 0.105 0.161 3.2e-13 3.0e-10 1.2e-10 P value adjustment method: noneSo with an unbalanced design, a little more calculation would be involved to get non-A comparison t-values. Is there a way to get those other, non-A, comparisons? Other than doing them by hand, that is? Because hey! What do I own a computer for? The pairwise.t.test() function gives the p-values for them, but if you insist on having all the gory details, that's another matter. They can be had though if you're willing to do a little work. Let's say we
want all the B-comparisons. The reasons we got A-comparisons above is because
A is the first level of the spray factor.
> levels(InsectSprays$spray) [1] "A" "B" "C" "D" "E" "F"If we could make B the first level, then we could get the B-comparisons. Easy peasy! > Bspray = relevel(InsectSprays$spray, ref="B") # "reference group" is now B > levels(Bspray) [1] "B" "A" "C" "D" "E" "F" > summary.lm(aov(InsectSprays$count ~ Bspray))$coef # the ANOVA must be recalculated Estimate Std. Error t value Pr(>|t|) (Intercept) 15.3333333 1.132156 13.5434869 1.001994e-20 BsprayA -0.8333333 1.601110 -0.5204724 6.044761e-01 BsprayC -13.2500000 1.601110 -8.2755106 8.509776e-12 BsprayD -10.4166667 1.601110 -6.5059045 1.212803e-08 BsprayE -11.8333333 1.601110 -7.3907075 3.257986e-10 BsprayF 1.3333333 1.601110 0.8327558 4.079858e-01And so on. The LSD (least significant difference) can be calculated this way.
> alpha = .05 > s = 3.921902 # from the aov output, this is sqrt of MSE > df = 66 # also from the aov output, df error > tcrit = qt(p=1-alpha/2, df=df) > LSD = tcrit * s * sqrt(1/12+1/12) # reciprocals of the group sizes, each group is n=12 > LSD [1] 3.196719Since all the group sizes are 12 (in the balanced data), this would be the LSD for all comparisons. If the data are unbalanced, then there would be several LSDs, depending upon which groups were being compared. What about confidence intervals? If you have LSD, a confidence interval is
just one more step. Let's do the 95%-CI for the D-E mean difference.
> summary.lm(aov.out)$coef # because I don't like to scroll Estimate Std. Error t value Pr(>|t|) (Intercept) 14.5000000 1.132156 12.8074279 1.470512e-19 sprayB 0.8333333 1.601110 0.5204724 6.044761e-01 sprayC -12.4166667 1.601110 -7.7550382 7.266893e-11 sprayD -9.5833333 1.601110 -5.9854322 9.816910e-08 sprayE -11.0000000 1.601110 -6.8702352 2.753922e-09 sprayF 2.1666667 1.601110 1.3532281 1.805998e-01 > (-9.5833333 - (-11.0)) - LSD # lower bound [1] -1.780052 > (-9.5833333 - (-11.0)) + LSD # upper bound [1] 4.613386 A note on Dunnett's test Notice above when you did summary.lm(aov.out), you got comparisons of group A with groups B, C, D, E, and F in turn. If group A is your control group, then these are Dunnett-like tests of all treatment groups with the control. The tests themselves are wrong, however, because Dunnett's adjustment to the p-value was not done. So far as I'm aware, there is no way to do it in the R base packages. There are optional packages that can be downloaded from CRAN that will do it, one of which is "multcomp". There are also tables of Dunnett's critical values online, for example, here. Bonferroni test The Bonferroni test is done exactly like the Fisher LSD test, but once the
p-values are obtained, they are multiplied by the number of comparisons being
made. In the case of InsectSprays, that would be 15 comparisons. The p-values
for all the comparisons can be obtained like this.
> with(InsectSprays, pairwise.t.test(x=count, g=spray, p.adjust="bonferroni")) Pairwise comparisons using t tests with pooled SD data: count and spray A B C D E B 1 - - - - C 1.1e-09 1.3e-10 - - - D 1.5e-06 1.8e-07 1 - - E 4.1e-08 4.9e-09 1 1 - F 1 1 4.2e-12 6.1e-09 1.6e-10 P value adjustment method: bonferroniNote: Any p-values that come out greater than 1 due to the multiplication are set at 1. You can't have a p-value greater than one (unless you're one of my students). The Bonferroni test is very conservative and does not require protection. Holm-Bonferroni test The Holm-Bonferroni test is a very clever modification of the Bonferroni
correction. For details you'll have to consult a textbook (or Wikipedia), but
here is how you get the p-values.
> with(InsectSprays, pairwise.t.test(x=count, g=spray, p.adjust="holm")) Pairwise comparisons using t tests with pooled SD data: count and spray A B C D E B 1.00 - - - - C 8.7e-10 1.2e-10 - - - D 6.9e-07 9.7e-08 0.49 - - E 2.5e-08 3.6e-09 1.00 1.00 - F 0.90 1.00 4.2e-12 4.0e-09 1.4e-10 P value adjustment method: holmHolm is actually the default for the p-adjustment method in this function, so it would not need to be specified explicitly. The Holm test is much less conservative than the Bonferroni test but not so much that it requires protection. It does not. Orthogonal Contrasts Clean up your workspace and search path. We're moving on to a new problem. We will look at a data set that records the dry-weight production of biomass
by plants as the result of three different treatments: a control, and two
experimental treatments.
> data(PlantGrowth) > summary(PlantGrowth) weight group Min. :3.590 ctrl:10 1st Qu.:4.550 trt1:10 Median :5.155 trt2:10 Mean :5.073 3rd Qu.:5.530 Max. :6.310 > with(PlantGrowth, tapply(weight, group, mean)) ctrl trt1 trt2 5.032 4.661 5.526 > with(PlantGrowth, tapply(weight, group, var)) ctrl trt1 trt2 0.3399956 0.6299211 0.1958711 > with(PlantGrowth, bartlett.test(weight ~ group)) Bartlett test of homogeneity of variances data: weight by group Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371The design is balanced, a necessary prerequisite for orthogonal contrasts, and although there are substantial differences in the group variances, Bartlett's test does not reveal a significant deviation from homogeneity of variance. Okay, stop! Don't go rushing headlong into the ANOVA without stopping and thinking first. What kind of hypotheses might we be interested in here? Well, any differences among the means. That's the traditional alternative for the omnibus ANOVA, but it's really kind of like beating the problem with a club! If we think about it, we might come up with hypotheses like these: (i) the treated groups will be different from the control group, and (ii) the two treated groups will be different from each other. (NOTE: Given that I don't know what the treatments are, I don't know if these are sensible hypotheses, but they COULD be. Use your imagination.) These are hypotheses which, like all hypotheses, we would want to express in advance. And now we want to come up with a method of data analysis that would allow us to test them. Enter what are called planned comparisons. The name makes perfect sense, since we're planning these comparisons in advance. We'll use contrasts to do the analysis. Contrasts do not have to be "orthogonal." Think of orthogonal as meaning something like "independent" or "unconfounded." I'll discuss nonorthogonal contrasts below. Right now, let's keep our contrasts orthogonal. There are several requirements for doing so. One has already been mentioned: a balanced design, which means orthogonal contrasts are done most often on experimental data (as opposed to quasi-experiments). Another one is that we can come up with a set of k-1 sensible orthogonal contrasts, where k is the number of levels of the IV we are testing. In this case, our IV has three levels, so we can have two contrasts, each corresponding to a sensible hypothesis. I think we have that. It now remains to code the contrasts. Each contrast is going to be coded with a contrast vector. Within the vector, each level of the IV is going to be given a code. Some of the codes will be positive, some will be negative. All of the positive codes in the vector should be the same, and they should add to +1. All of the negative codes in the vector should be the same, and they should add to -1. Thus, the codes in the vector should, overall, add to 0. The positively coded groups will be combined into one big "megagroup." The negatively coded groups will be combined into another big "megagroup." Then these two megagroups will be tested against each other, using what essentially amounts to a t-test. This is called a complex contrast, by the way, when you have groups combined into megagroups. A contrast can also just test one group against another, which is called a simple contrast. Fisher LSD tests are simple contrasts. Our first hypothesis suggests we want to lump the treatment groups into a megagroup, and test that megagroup against the control group. The contrast vector would be: These codes have to be in the SAME ORDER that R sees the levels of the variable, so we better check that. > levels(PlantGrowth$group) [1] "ctrl" "trt1" "trt2"Good! Our contrast will test "ctrl" (+1) against the megagroup "trt1+trt2" (each coded -1/2). "That's not going to give us anything," you're saying. "I've seen the means, and that's just not going to pan out!" Stop looking at the means! These are PLANNED in advance comparisons. You haven't got any means yet! For our second hypothesis, we want to compare "trt1" to "trt2". The secret here is to code any level not included in the contrast as 0. So the vector is: That will compare the second level of the IV to the third level, a simple contrast and, FYI, equivalent to a Fisher LSD test. By the way, it's completely arbitrary which levels are coded positive and which negative, but I'll give you a hint below as to what is a convenient way to choose. > vec1 = c(1,-1/2,-1/2) > vec2 = c(0,1,-1) > ### now cbind the contrast vectors into the columns of matrix > cons = cbind(vec1, vec2) > cons vec1 vec2 [1,] 1.0 0 [2,] -0.5 1 [3,] -0.5 -1 > ### it's helpful (but not really necessary) to name the rows of the matrix > rownames(cons) = c("ctrl", "trt1", "trt2") > cons vec1 vec2 ctrl 1.0 0 trt1 -0.5 1 trt2 -0.5 -1 > ### to be orthogonal, all pairs of contrast vectors must have a dot product of 0 > ### here's a convenient way to test that for a contrast matrix of any size > t(cons) %*% cons vec1 vec2 vec1 1.5 0 vec2 0.0 2 > ### if this matrix multiplication results in a matrix that is nonzero ONLY on > ### the main diagonal, then the contrasts are orthogonalAnd we are coded! Now we assign that contrast matrix to the IV and run the ANOVA. We're not really interested in the ANOVA itself, but this is how we're going to get the contrasts. > contrasts(PlantGrowth$group) = cons > aov.out = aov(weight ~ group, data=PlantGrowth)Now here's the tricky part. I'll do it, and then I'll explained what happened. > summary(aov.out, + split=list( + group=list("ctrl v trts"=1, "trt1 v trt2"=2) + )) Df Sum Sq Mean Sq F value Pr(>F) group 2 3.766 1.883 4.846 0.01591 * group: ctrl v trts 1 0.025 0.025 0.065 0.80086 group: trt1 v trt2 1 3.741 3.741 9.627 0.00446 ** Residuals 27 10.492 0.389 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1I get that wrong EVERY time! I always have to look at an example. We used the usual extractor function, summary(aov.out), but this time we used it with an option called split=. The technical statistical jargon for what split does is "partition." Our contrasts partitioned the treatment sum of squares. Because we had a full set of orthogonal contrasts, the partitioning was spot on. I.e., the contrast SSs add up to the treatment SS. What's up with that syntax?! For once, it's actually quite sensible. Inside split we have a list that lists ALL of the factors we're doing contrasts on. Here we only have one, so our list exhausted itself at "group." Inside each of those factors we have another list. In that list, we give our contrasts sensible names, and we set those names equal to one of the columns in the contrast matrix (by number of the column). And that's pretty much it. "I thought you said they were like t-tests. That don't look like no t-test
to me!"
> summary.lm(aov.out)$coef Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0730 0.1138121 44.5734621 8.020008e-27 groupvec1 -0.0410 0.1609546 -0.2547302 8.008617e-01 groupvec2 -0.4325 0.1393908 -3.1027872 4.459236e-03The "estimates" are meaningless. They do have something to do with the differences in the means, but it's best not to go there! The t-values are the square roots of the corresponding F-values in the previous table (ignoring signs), and the p-values are the same. The degrees of freedom for the t-tests is just the error df from the onmibus ANOVA (27 in this case). The intercept is the overall (grand) mean, and it is significantly different from 0, but who cares? Here is another brief example. The data are based on an experiment by Eysenck (1974) on incidental learning, which is learning that takes place when one is not really attempting to learn but just through exposure to the material. Subjects were given a list of 30 words and one of five different instructions: count the letters in each word, think of a word that rhymes with each word, think of an adjective that could be used to modify the word in a sentence, form a mental image of the object described by the word, or a control condition in which the subjects were told to memorize the list. These are not the original data but were created to match the cell means and SDs. The data were taken from Howell's textbook (7th ed., 2010, Table 11.1, p. 319). The logic of the following contrasts is that only the control group was told
they'd be asked to recall the words, and that count and rhyme instructions
represent a shallow level of mental processing, while adjective and image
instructions represent deep mental processing. (Note: In the full experiment,
there were two age groups. These data are for the older subjects.)
> scores = scan() 1: 9 8 6 8 10 4 6 5 7 7 # count instructions 11: 7 9 6 6 6 11 6 3 8 7 # rhyme instructions 21: 11 13 8 6 14 11 13 13 10 11 # adjective instructions 31: 12 11 16 11 9 23 12 10 19 11 # image instructions 41: 10 19 14 5 10 11 14 15 11 11 # control instructions 51: Read 50 items > ### btw, trying to put comments into a scan operation like that will fail! > groups = gl(n=5, # generate levels, n=no. of levels + k=10, # k=replications per group + labels=c("count","rhyme","adjec","image","contr")) > levels(groups) [1] "count" "rhyme" "adjec" "image" "contr" > c1=c(-1/4, -1/4, -1/4, -1/4, +1) # control vs. treatments > c2=c(-1/2, -1/2, +1/2, +1/2, 0) # shallow vs. deep processing > c3=c( 0, 0, -1, +1, 0) # are the deep processing conditions different > c4=c( -1, +1, 0, 0, 0) # are the shallow processing conditions different > cons = cbind(c1, c2, c3, c4) > rownames(cons) = levels(groups) > cons c1 c2 c3 c4 count -0.25 -0.5 0 -1 rhyme -0.25 -0.5 0 1 adjec -0.25 0.5 -1 0 image -0.25 0.5 1 0 contr 1.00 0.0 0 0 > contrasts(groups) = cons > aov.out = aov(scores ~ groups) # do this AFTER contrasts are assigned > summary(aov.out, split=list( + groups=list("cont-treat"=1, "shallow-deep"=2, + "deep-deep"=3, "shallow-shallow"=4) + )) Df Sum Sq Mean Sq F value Pr(>F) groups 4 351.5 87.88 9.085 1.82e-05 *** groups: cont-treat 1 47.0 47.05 4.863 0.0326 * groups: shallow-deep 1 275.6 275.63 28.493 2.96e-06 *** groups: deep-deep 1 28.8 28.80 2.977 0.0913 . groups: shallow-shallow 1 0.0 0.05 0.005 0.9430 Residuals 45 435.3 9.67 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > summary.lm(aov.out) # forgot $coef so we'll get full output Call: aov(formula = scores ~ groups) Residuals: Min 1Q Median 3Q Max -7.00 -1.85 -0.45 2.00 9.60 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0600 0.4399 22.872 < 2e-16 *** groupsc1 1.9400 0.8797 2.205 0.0326 * groupsc2 5.2500 0.9835 5.338 2.96e-06 *** groupsc3 1.2000 0.6955 1.725 0.0913 . groupsc4 -0.0500 0.6955 -0.072 0.9430 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.11 on 45 degrees of freedom Multiple R-squared: 0.4468, Adjusted R-squared: 0.3976 F-statistic: 9.085 on 4 and 45 DF, p-value: 1.815e-05One new function here, gl(), which means generate levels for a balanced design. The syntax is kind of goofy in that n stands for the number of levels of the IV, while k stands for the number of replicates (subjects per group). Much more common statistical notation would have that the other way around, so don't get confused.
Nonorthogonal Contrasts Those are not the contrasts I would have selected for the Eysenck study, and luckily for me, the statistics police will not come and put me in statistics prison if my contrasts are not orthogonal. (I'll probably get off with a warning.) There is no rule that says contrasts have to be orthogonal, even when the
design is balanced. Here are the contrasts I would test. (Note: Contrasts are
very personal things, kind of like underwear. People have different tastes.
As long as those tastes correspond to sensible hypotheses, the rule is live
and let live!)
> levels(groups) [1] "count" "rhyme" "adjec" "image" "contr" > c1=c(-1/2, -1/2, +1/2, +1/2, 0) # test shallow against deep processing > c2=c(-1/2, -1/2, 0, 0, 1) # test shallow processing against control > c3=c( 0, 0, -1/2, -1/2, 1) # test deep processing against control > cons = cbind(c1, c2, c3) > rownames(cons) = levels(groups) > cons c1 c2 c3 count -0.5 -0.5 0.0 rhyme -0.5 -0.5 0.0 adjec 0.5 0.0 -0.5 image 0.5 0.0 -0.5 contr 0.0 1.0 1.0 > t(cons) %*% cons # not orthogonal c1 c2 c3 c1 1.0 0.5 -0.5 c2 0.5 1.5 1.0 c3 -0.5 1.0 1.5 > contrasts(groups) = cons Error in `contrasts<-`(`*tmp*`, value = c(-0.5, -0.5, 0.5, 0.5, 0, -0.5, : singular contrast matrixOkay, I didn't get away with a warning. I got an error. So the bus stops here! (Or doesn't stop here, depending upon how you think about it!) Yeah, I know, it annoys me, too. So I've written a function to do it (see
Writing Your Own Functions and Scripts if you don't
know what I'm talking about). Open a script window (File > New Script in
Windows or File > New Document on a Mac), and copy and paste these lines into it.
### begin copying here SScontrasts = function (means, contr.matrix, n, mse, dfe) { diffs = (as.vector(means) %*% contr.matrix) tmp1 = diffs^2/apply(contr.matrix^2/as.vector(n), 2, sum) tmp2 = tmp1/mse tmp3 = 1 - pf(tmp2, 1, dfe) output = rbind(tmp1, tmp2, tmp3, diffs) rownames(output) = c("SS", "F", "p-value", "diffs") output } ### end copying here and pasteThen save the script as "SScontrasts.R" to whatever your usual working directory is (Rspace). Then, any time you want it, you just read it in like this. > source("SScontrasts.R") # this will put it in your workspaceYou use it like this. > means = tapply(scores,groups,mean) # put the group means in a vector > means count rhyme adjec image contr 7.0 6.9 11.0 13.4 12.0 > aov.out # you'll need some of this information, too Call: aov(formula = scores ~ groups) Terms: groups Residuals Sum of Squares 351.52 435.30 Deg. of Freedom 4 45 Residual standard error: 3.110198 Estimated effects may be unbalanced > args(SScontrasts) # check to see what you need to feed the function function (means, contr.matrix, n, mse, dfe) > ### means is the vector of means > ### contr.matrix is the contrast matrix you've prepared (cons) > ### n is a vector of group sizes; if the design is balanced, a single number will do > ### mse is the mean square error term from the ANOVA table > ### dfe is the error degrees of freedom from the ANOVA table > > SScontrasts(means, cons, n=10, mse=435.3/45, dfe=45) c1 c2 c3 SS 2.756250e+02 1.700167e+02 0.2666667 F 2.849328e+01 1.757581e+01 0.0275672 p-value 2.963530e-06 1.275381e-04 0.8688742 diffs 5.250000e+00 5.050000e+00 -0.2000000The function reports a matrix of results. For each contrast (in columns) you'll get a sum of squares (these are not orthogonal contrasts, so don't expect the SS treatment to be correctly partitioned), an F value for the test (which is on 1 and dfe degrees of freedom), a p-value, and the difference in the means of the two (mega)groups being contrasted. If you'd rather report your results as t-tests, then t = sqrt(F) on dfe degrees of freedom, with a p-value as reported in the above table. These are two-tailed p-values when it comes to the t-test. For one-tailed, divide them in half, PROVIDED the effect went in the predicted direction. If you code the group(s) that you expect to have the larger mean with positive-valued contrast codes, then "diffs" will be positive if the effect went in the direction you expected. A Final Note Do planned contrasts require a p-adjustment for the number of comparisons? Oh no! You're not dragging me into that argument! Just try to hold it down to a reasonable number. created 2016 February 4 |