Factorial ANOVA Through and Through

Psyc 480 -- Dr. King

Balanced Factorial ANOVA: Beginning to End (With Some New Stuff)

Today we will do a balanced factorial ANOVA from beginning to end. I got the data here:

"https://psych.hanover.edu/classes/ResearchMethods/jamovi/2way/2way-1.html"

You cannot load the data file using this url. The data file is posted there as a CSV file. You can load the data from my website as follows:

file = "http://ww2.coastal.edu/kingw/psyc480/data/JR89.txt"
JR89 = read.table(file = file, header=T, stringsAsFactors=T)

You will not be required to use R to complete this exercise, although I very strongly recommend it, because there will come a time when you'll have to know how to do this. Today, I will do the R and you answer the questions as you go through the exercise. If you can't answer most of these questions, you need to review.

Once you load the data, you should have two items in your workspace. (You did remember to clear your workspace first, right?)

> ls()
[1] "file" "JR89"

JR89 is the data frame.

> summary(JR89)
        Attractiveness      Commitment      rating       
 HighAttractive:100    HighCommit:100   Min.   : 0.1611  
 LowAttractive :100    LowCommit :100   1st Qu.: 3.0348  
                                        Median : 4.8020  
                                        Mean   : 4.9976  
                                        3rd Qu.: 6.6347  
                                        Max.   :12.1973

These data are simulated and are loosely based on a study by Johnson and Rusbult (Johnson, D. J. & Rusbult, C. E. 1989. Resisting Temptation: Devaluation of Alternative Partners as a Means of Maintaining Commitment in Close Relationships. Journal of Personality and Social Psychology, 57(6), 967-980.) The article describes three studies by these investigators. These data simulate the result of the second one (however, the actual study is much more complex).

The basic phenomenon is that, once you have made a choice and are committed to it, alternatives are devalued. If you choose college A over college B, college B will be devalued, or evaluated as worse than you might have seen it before making the choice. In this study, the choice is of partners in a romantic relationship.

N=200 college students participated in the study as part of a research participation requirement for their general psychology course. All subjects were involved in a heterosexual romantic relationship, to which their commitment was either high or low (the Commitment variable). Each subject was exposed to an alternative partner via a supposed online dating app. The attractiveness of the alternative partner was either high or low (the Attractiveness variable). The subjects' task was to evaluate the desirability of the alternative partner on a number of scales. These evaluations were combined into the DV, which is in the rating variable.

Here is the authors' abstract.

"This work tested the hypothesis that persons who are more committed to their relationships devalue potential alternative partners, especially attractive and threatening alternatives. In Study 1, a longitudinal study, perceived quality of alternatives decreased over time among stayers but increased for leavers. In Study 2, a computer dating service paradigm, more committed persons exhibited greatest devaluation of alternatives under conditions of high threat when personally evaluating extremely attractive alternative partners. In Study 3, a simulation experiment, the tendency to reject and devalue alternatives was greater under conditions of high commitment. In all three studies, tendencies to devalue were more strongly linked to commitment than to satisfaction."

Descriptive Statistics

Question 1) What do we do first?

> with(JR89, table(Attractiveness, Commitment))
                Commitment
Attractiveness   HighCommit LowCommit
  HighAttractive         50        50
  LowAttractive          50        50

What critical thing have we learned from this? How many subjects are in each of the groups?

Question 2) How would you name this design?

> with(JR89, tapply(rating, list(Attractiveness, Commitment), mean))
               HighCommit LowCommit
HighAttractive   3.945609  6.921761
LowAttractive    4.727338  4.395778

Question 3) What means are these?

Question 4) Are these means weighted or unweighted? (Hint: if you don't remember Psyc 225 very well, you may need to look at the practice problems before you can answer questions about weighted vs. unweighted means. In brief, weighted vs. unweighted means is an issue when you are calculating means of means, i.e., when you are calculating marginal means. If the design is balanced, it doesn't matter. Weighted and unweighted marginal means will be the same.)

> with(JR89, tapply(rating, list(Attractiveness), mean))
HighAttractive  LowAttractive 
      5.433685       4.561558 
> with(JR89, tapply(rating, list(Commitment), mean))
HighCommit  LowCommit 
  4.336473   5.658769

Question 5) What means are these?

Question 6) Are these weighted or unweighted means?

Question 7) What effect (if any) is shown in the second pair of means?

> with(JR89, tapply(rating, list(Attractiveness, Commitment), var))
               HighCommit LowCommit
HighAttractive   5.675195  8.162407
LowAttractive    6.907246  5.772754

Question 8) The variances are not all the same. Does this mean that the homogeneity of variance assumption has been violated?

Graphing

> with(JR89, interaction.plot(Commitment, Attractiveness, rating, type="b", pch=c(2,16)))
> # Note: the type="b" option says plot both lines and points, and the pch= option tells
> # which point characters to use (2 is an open triangle, 16 is a filled circle)
> # do this: plot(1:20, 1:20, pch=1:20) to see all the point characters available

Question 9) What is this graph called (two names)?

Question 10) What effect is most clearly shown on this graph?

The ANOVA

> aov.out = aov(rating ~ Commitment * Attractiveness, data=JR89)
> summary(aov.out)
                           Df Sum Sq Mean Sq F value   Pr(>F)    
Commitment                  1   87.4   87.42  13.187  0.00036 ***
Attractiveness              1   38.0   38.03   5.737  0.01756 *  
Commitment:Attractiveness   1  136.8  136.76  20.630 9.72e-06 ***
Residuals                 196 1299.4    6.63                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Prelimary question: Why did that first command line produce no output?

Question 11) Is the main effect of Commitment statistically significant? Summarize the results of the ANOVA?

Question 12) Is the main effect of Attractiveness statistically significant? Summarize the results of the ANOVA?

Question 13) Is the Commitment-by-Attractiveness interaction statistically significant? Summarize the results of the ANOVA?

Question 14) Which of these effects is most interesting? Why?

Checking Assumptions

Here's something new! Once you have the aov.out object in your workspace (the result of the ANOVA, whatever you called it), there is an interesting way to check for violations of assumptions graphically. We are going to draw what are called diagnostic plots.

> par(mfrow=c(2,2))  # sets the graphics window for four graphs
> plot(aov.out, 1:4)

Question 15) What are these graphs called? (Hint: you may not know this yet. Click the check button to find out. You'll be seeing a lot of these in the future, so read carefully!)

The graph in the upper left corner is called a residuals plot. It is not terribly interesting to us right now. The graph in the upper right corner is called a normal probability plot or a normal Q-Q (quantile-quantile) plot. It is a rough test of the normal distribution assumption. If all of the plotted points on the graph are lined up along the dotted line from lower left to upper right, then normality has been met. Of course, the result is not going to be perfect, because all we have are samples, and samples are what? Noisy! So we can expect some deviations from the dotted line, especially near the ends. What we DON'T want to see are the plotted points forming a pronounced bow (which would indicate skew) or snaking around the dotted line in an S-shaped pattern (to be discussed at a later date). We also don't want to see the plotted points suddenly rising rapidly above the dotted line at the top end or suddenly falling below the dotted line at the lower end (which would indicate outliers).

Question 16) Would you say the normal distribution assumption has been met by these data? Why?

The graph in the lower left corner is called a scale-location plot. It can be used to check the homogeneity of variance assumption. You'll notice there is a column of plotted points above each of the group means on the horizontal axis. These columns of points should all be roughly the same height. If one of them is dramatically taller or shorter than the others, then homogeneity has been violated. Another way to make the evaluation is to look at the red line. It should be flat. Once again, the result is not going to be perfect because all we have are sample data. We are looking for something reasonably close to what we would expect when variances are homogeneous.

Question 17) Would you say the homogeneity of variance assumption has been met by these data? Why?

The graph in the lower right corner is called a Cook's distance plot. Cook's distance is calculated for each data point and is plotted as a vertical line on this graph. Cook's distance (or Cook's D) is a measure of how much a data value constitutes an outlier (or "influential point" as we will eventually call it). An outlier is a data value that doesn't seem to go with the rest of the data values in its group, either because it is too large or too small. Such values can have an undue influence on the outcome of the analysis, especially if the groups are small. What we want to see, ideally, in the Cook's distance plot is a freshly mown lawn. (That's a joke!) In other words, we want all the vertical lines to be the same height. Ain't gonna happen. Why not? Because these are sample data, and samples are noisy. So we don't want to see any of the vertical lines being pronouncedly higher than the others. Here, R has flagged three of the data values as potential outliers, cases 8, 98, and 130. Now we have to satisfy ourselves that these data values are not causing a problem, and to do that, we look at the Cook's distances for these points on the vertical axis.

Unfortunately (or fortunately, depending upon how you look at it), there is no universally accepted criterion for what constitutes a large Cook's distance. I can tell you this, however, without fear of being contradicted. A Cook's D of 1.0 or more is definitely a problem. A Cook's D of 0.5 or more is something that needs to be investigated. Here, our Cook's Ds are much tinier than that.

Question 18) Would you say we have any problematic outliers in these data? Why?

Post Hoc Tests

It looks like we've passed the assumptions game, so on to something else. We have a significant interaction. To complete our evaluation of this interaction we need to evaluate the simple effects post hoc. (If there was not a signficant interaction but there were significant main effects instead, we would now be considering post hoc tests on the main effects. Here that would be silly. Why? Because each IV contains only two groups, and the ANOVA will tell us whether or not their means are significantly different. We only need post hoc tests on main effects when the interaction is not significant AND the main effect consists of more than two means.) A reminder:

> with(JR89, tapply(rating, list(Attractiveness, Commitment), mean))
               HighCommit LowCommit
HighAttractive   3.945609  6.921761
LowAttractive    4.727338  4.395778

Question 19-22) What are the simple effects?
19) 3.946 vs. 6.922 is
20) 4.727 vs. 4.396 is
21) 3.946 vs. 4.727 is
22) 6.922 vs. 4.396 is

In each case, the simple effect is the effect of one factor at one level of the other. To see an effect of a factor, we have to have at least two means from that factor. So, 3.946 vs. 6.922 is two means from the Commitment factor so it is an effect of Commitment. But they are only the means from one level of Attractiveness, High. So this is a simple effect of Commitment at High Attractiveness. And so on.

How do we test these effects? Right now you should be thinking, "Hey! Two means! Sounds like a t-test to me!" And you'd be right. Because we only have two levels of each factor and, therefore, only two cell means in each simple effect, we can test all the simple effects with Fisher LSD tests. Since there are four tests to be done, we might consider a Bonferroni correction, but I'm going to skip that, and by the time we're done here, you'll know why.

Another reminder. You may remember this from a past slide in the ANOVA lecture.

The bottom (denominator) of this formula is the standard error of the mean difference, and it is going to be the same for all four tests because we are assuming homogeneity of variance and all the group sizes are the same. We can get the pooled standard deviation from the ANOVA summary table. How?

But R has shortchanged us on significant figures here. It has only given us three, and it would be good to have more precision than that. Here's how we can get a more precise value for sd.pooled. It is in aov.out.

> aov.out   # no summary() here
Call:
   aov(formula = rating ~ Attractiveness * Commitment, data = JR89)

Terms:
                Attractiveness Commitment Attractiveness:Commitment Residuals
Sum of Squares         38.0303    87.4234                  136.7620 1299.3625
Deg. of Freedom              1          1                         1       196

Residual standard error: 2.574762
Estimated effects may be unbalanced

Remember what I told you once before about residual standard error. Residual standard error is the pooled standard deviation when the data are grouped. Simple as that.

Question 23) What is the value of sd.pooled (do not round)?

Question 24) What is the value of se.meandiff for all four tests (6 decimal places please)?

Now complete the following table of tests. (Round as little as possible so as to have at least three accurate decimal places in the t-value. Drop any negative signs as these are two-tailed tests. I have calculated the p-values for you in R, as we will see them again below.)

Tests of Simple Effects
	Commitment at High Attractiveness	Commitment at Low Attractiveness	Attractiveness at High Commitment	Attractiveness at Low Commitment
mean difference
se.meandiff
t-value
degrees of freedom
p-value	p < .001	p = 0.520	p = 0.131	p < .001

This is how you calculate the p-values, just for the hopelessly curious. The first value inside pt() is the t.calc. Always make sure this is positive for a two-tailed test. The second value is df. The option lower.tail=FALSE is a little trick that makes this work. It's multiplied times 2 because this is a two-tailed test.

> pt(5.779475, df=196, lower.tail=F) * 2
[1] 2.911625e-08
> pt(0.6438658, df=196, lower.tail=F) * 2
[1] 0.5204152
> pt(1.5180618, df=196, lower.tail=F) * 2
[1] 0.1306101
> pt(4.9052785, df=196, lower.tail=F) * 2
[1] 1.9555e-06

The following material is OPTIONAL until I say it's not!

The problem with testing the simple effects this way is that it's only good for two means. If there are more than two levels of the IV, this test kind of craps out! So here's another way to do it that will work no matter how many means we have to compare. It's going to be an ANOVA (hope you guessed that), but before we do it, I want to reproduce the error line of the ANOVA we've already done. We're going to need it. I'm also going to bump up the precision of these values a bit for the sake of demonstrating something.

             Df   SS      MS
Residuals   196 1299.3625  6.629401

Now we're ready. Watch closely. This is tricky.

> aov.simple1 = aov(rating ~ Commitment, data=JR89, subset=Attractiveness=="HighAttractive")
> print(summary(aov.simple1), digits=8)
            Df    Sum Sq    Mean Sq  F value     Pr(>F)    
Commitment   1 221.43705 221.437049 32.00512 1.5247e-07 ***
Residuals   98 678.04252   6.918801                        
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA was for rating by Commitment, but only for the subset of ratings that are for High Attractiveness. Therefore, this is (almost) the test of the simple effect of Commitment at High Attractiveness. It is not quite the test, however. Because we are assuming homogeneity of variance, we get to use the error line from the full factorial ANOVA. So the actual test is:

            Df     Sum Sq    Mean Sq  F value
Commitment   1  221.43705 221.437049 
Residuals  196 1299.3625    6.629401           ## substituted from overall ANOVA

I've left the F-value blank. You calculate it!

Question 25) What is the F-value for the test of the simple effect of Commitment at High Attractiveness?

I got a little carried away with decimal places here because I want to demonstrate something. What is the square root of the F-value? It is 5.77947. And where have we seen that value before? I'll wait. Go look for it! Correct! It is the t-value we got from the Fisher LSD test for this same effect. The p-value is also the same (with a little difference in rounding).

> pf(33.402269, df1=1, df2=196, lower.tail=F)
[1] 2.911705e-08

Question 26) Is the simple effect of Commitment at High Attractiveness statistically significant?

This test is exactly equivalent to the Fisher LSD test when there are only two means. But it also works when there are more than two means.

Now we'll do the rest of them.

> aov.simple2 = aov(rating ~ Commitment, data=JR89, subset=Attractiveness=="LowAttractive")
> print(summary(aov.simple2), digits=8)
            Df    Sum Sq   Mean Sq F value
Commitment   1    2.7483 2.7482985
Residuals  196 1299.3625 6.629401           ## substituted from overall ANOVA

Question 27) What is the F-value for the test of the simple effect of Commitment at Low Attractiveness?

> pf(0.414562, df1=1, df2=196, lower.tail=F)
[1] 0.5204158

Question 28) Is the simple effect of Commitment at Low Attractiveness statistically significant?

> aov.simple3 = aov(rating ~ Attractiveness, data=JR89, subset=Commitment=="HighCommit")
> print(summary(aov.simple3), digits=8)
                Df     Sum Sq    Mean Sq F value
Attractiveness   1   15.27751 15.2775082
Residuals      196 1299.3625   6.629401           ## substituted from overall ANOVA

> pf(2.304508, df1=1, df2=196, lower.tail=F)
[1] 0.1306104

> aov.simple4 = aov(rating ~ Attractiveness, data=JR89, subset=Commitment=="LowCommit")
> print(summary(aov.simple4), digits=8)
                Df    Sum Sq   Mean Sq  F value 
Attractiveness   1  159.51475 159.51475 
Residuals      196 1299.3625    6.629401           ## substituted from overall ANOVA

> pf(24.061713, df1=1, df2=196, lower.tail=F)
[1] 1.95554e-06

Question 29) What is the F-value for the test of the simple effect of Attractiveness at High Commitment?

Question 30) Is the simple effect of Attractiveness at High Commitment statistically significant?

Question 31) What is the F-value for the test of the simple effect of Attractiveness at Low Commitment?

Question 32) Is the simple effect of Attractiveness at Low Commitment statistically significant?

The following material is NOT OPTIONAL!

Least Significant Differences

When all comparisons involve just two means, it's also possible to use the Fisher LSD formula to calculate a least significant difference, or LSD. This is done by finding a t.critical value for the degrees of freedom given and then multiplying that by the se.meandiff (denominator of the t-test formula). In this case, the LSD would be:

> qt(p=0.975, df=196)   # t.critical with 0.025 in the upper tail
[1] 1.972141
> 1.972141 * 0.514952   # t.crit * se.meandiff
[1] 1.015558

Any difference between two means that is larger than this is statistically significant at the alpha=.05 level. If we were worried about inflation of Type I error rates due to the number of comparisons we are doing, we can decrease the alpha level accordingly, which amounts to doing a Bonferroni correction.

> .05 / 4   # Bonferroni corrected alpha for 4 comparisons
[1] 0.0125
> qt(p=0.99375, df=196)   # t.critical with 0.00625 (one-half alpha) in the upper tail
[1] 2.520969
> 2.520969 * 0.514952   # Bonferroni adjusted LSD
[1] 1.298178

Remember how to do this! Someone may ask you. If you don't understand this, ASK, ASK, ASK!

Effect Sizes

We have not calculated effect sizes yet. We will use eta-squared as our effect size statistic. For that, we will need to get SS.total.

Question 33) What is SS.total?

Questions 34-36) What are the following eta-squared values?
Commitment: eta-sq =
Attractiveness: eta-sq =
Commitment-by-Attractiveness: eta-sq =

Question 37) How would you evaluate the size of these effects?

Question 38) Well in that case, how is it that some of these effects have such small p-values?

Conclusions

Question 39) The hypothesis that people in commited relationships devalue attractive or threatening alternative partners is:

Question 40) Did the researchers find evidence that people in commited relationships devalue potential alternative partners even if they are not attractive?

Question 41) Both main effects were statistically significant. How would the main effect of Commitment be described?

Question 42) Was what you said in question 41 true for both high attractive and low attractive alternative partners?

The End