PSYC 480 -- Spring 2021 -- Dr. King

Factorial ANOVA Practice Problems (mostly balanced)

1) butterfat

file = "http://ww2.coastal.edu/kingw/psyc480/data/butterfat.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)

Butterfat in the milk of diary cows by breed (5 breeds) and age, 2-year-old
cows and mature (5-year-old or older) cows. How do these factors relate to
butterfat content of milk? Is there an interaction? If so, describe it. (These
data are from Hand et al. (1994). A Handbook of Small Data Sets, and were
taken from Sokal and Rohlf (1981) Biometry (2nd ed.). They come originally
from Canadian records of pure-bred dairy cattle. Butterfat content is in
percent butterfat. FYI, 4% butterfat is considered whole milk. Be sure to
check the cell variances! (It's not psychology, but I like this problem,
having spent a fair amount of my youth around dairy cows, and an ANOVA is an
ANOVA whether the subjects are people or cows!)

> aov.butterfat = aov(butterfat ~ age * breed, data=X)
> summary(aov.butterfat)
            Df Sum Sq Mean Sq F value Pr(>F)    
age          1   0.21   0.207   1.172  0.282    
breed        4  34.32   8.580  48.595 <2e-16 ***
age:breed    4   0.27   0.067   0.381  0.821    
Residuals   90  15.89   0.177                   
---

> with(X, tapply(butterfat, breed, mean))
Ayrshire Canadian Guernsey Holstein   Jersey 
  4.0600   4.4385   4.9500   3.6695   5.2925 
> TukeyHSD(aov.out, which="breed")   # be sure to specify which effect you want
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = butterfat ~ breed * age, data = X)

$breed
                     diff         lwr         upr     p adj
Canadian-Ayrshire  0.3785  0.00858404  0.74841596 0.0422752
Guernsey-Ayrshire  0.8900  0.52008404  1.25991596 0.0000000
Holstein-Ayrshire -0.3905 -0.76041596 -0.02058404 0.0332224
Jersey-Ayrshire    1.2325  0.86258404  1.60241596 0.0000000
Guernsey-Canadian  0.5115  0.14158404  0.88141596 0.0020194
Holstein-Canadian -0.7690 -1.13891596 -0.39908404 0.0000010
Jersey-Canadian    0.8540  0.48408404  1.22391596 0.0000001
Holstein-Guernsey -1.2805 -1.65041596 -0.91058404 0.0000000
Jersey-Guernsey    0.3425 -0.02741596  0.71241596 0.0832286
Jersey-Holstein    1.6230  1.25308404  1.99291596 0.0000000

2) writing

file = "http://ww2.coastal.edu/kingw/psyc480/data/writing.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)

For her senior research project (Psyc 497, Sp 2002), Susan McClean presented
180 subjects with a short technical article on linquistics and asked them to
rate the competency of the writing on a 1-to-5 Likert-type rating scale, with
a rating of 1=highly unfavorable, and a rating of 5=highly favorable. Half
the raters were female and half were male. All were freshman at CCU. In each
gender group, one-third of the subjects were told the author of the article
was female, one-third were told the author was male, and one-third were
given no information about the author. I do not have the original data, but
these data were reconstructed to match Susan's cell and marginal means and SDs
as closely as possible, and thus to give the same results in the ANOVA table.
Variables:
    rating - the student's (subject's) rating of the writing competency
    gender - the student's (subject's) gender
    author - what the student/subject was told about the article's author
How does the subject's gender and what the subject was told about the gender of
the author impact the subject's rating?

> aov.writing = aov(rating ~ gender * author, data=X)
> summary(aov.writing)
               Df Sum Sq Mean Sq F value Pr(>F)
gender          1   0.67  0.6722   1.155  0.284
author          2   0.48  0.2389   0.411  0.664
gender:author   2   2.34  1.1722   2.015  0.136
Residuals     174 101.23  0.5818

Question: If we graph the cell means in an interaction plot, it looks like
          there is an interaction, but we didn't find one. What's up with that?
          
Answer:   Expand the limits on the vertical axis to the full range of the DV
          and see what happens.

Note: Balanced designs are hard to come by in Psyc 497 projects. I've been
          collecting 497 data for 25 years, and this is one of the very few
          I have. That's why we're going to have to talk about unbalanced
          designs, even though that might be a little painful. Unbalanced
          factorial designs are common in social science research.

3) audit

AUDIT stands for Alcohol Use Disorders Identification Test. Scores on this
test greater than 8 supposedly indicate the possible existence of "problem
drinking." Kellie Dunlap used the AUDIT as the dependent measure in her Psyc 497
project (Fall 2008). 70 CCU students were scored on the test. Other information
she collected from her subjects was gender and home state. Here are the scores
sorted by gender. Does there appear to be a gender difference in AUDIT scores?
If so, is it a big difference? (You've already done this problem. Check your
notes.)

females: 17  3 19  9 14  9  9  5 16 10  4  3 10  3  2 10 20  5  2  6  4  8  4
          4  1  2  1 21 14  1  4 11 10  8  7  0 18  1  9  6  0  9  8  4 14  4
         11 22  0 17
males:   11  7  9 10  6 13  1  8 13 11 26 13 19  6  1 20 17 12 15 24

Here are the scores sorted by whether the student's home state was north or
south of the Mason-Dixon line. Does there appear to be a difference in AUDIT
scores between students from northern states and students from southern states?
If so, is it a big difference? (You've done this one, too.)

north: 17 11 19 14  6  9 10 20  2  4  8  4  1  8 21 14  4 26 11  0 19  1  4 11
       22 17 24
south:  3  7  9  9 10 13  9  5 16  4  3 10  3  2 10  5  6  4  1  2  1  1 13 11
       13 10  8  7 18  6  1  9  6  0  9  8  4 14 20 17 12  0 15

These are t-tests.

Now we will cross-classify these two variables. From the cell means, can you
tell what's going on here? Is there a Gender x Region interaction? Why can't we
analyze these data the way we have been doing? MSW = 36.677. Can that be used
to help our cause?

                      Region               marginal means
              north          south         wgted.  unwgted.
              ---------------------------
              M = 10.600     M = 6.233      7.98     8.42
    females   SD = 7.126     SD = 4.621
              n = 20         n = 30
Gender        ---------------------------
              M = 13.571     M = 11.308    12.10    12.44
    males     SD = 9.537     SD = 4.922
              n = 7          n = 13
              ---------------------------
        wgted.    11.37           7.77
      unwgted.    12.09           8.77

Because the design is unbalanced, there are several possible ANOVAs here. Let
me explain why. The problem is with the main effects. All generally accepted
methods of doing this will give the same result for the interaction. (This is
true ONLY for two factor ANOVAs.) In this case, the interaction is not signifi-
cant. That means we move on and look at the main effects.

The main effects are seen in the marginal means. So how do we get those? Aha!
There are two methods for calculating the marginal means, and therein lies the
problem. If the design is balanced, they both give the same answer. If the
design is unbalanced, they do not.

Weighted Marginal Means

Look at the first row in the table above (females). The weighted marginal mean
is what you would get if you ignored region and just dumped all 50 of those
subjects into one big hat and calculated the mean. It's also the mean you
would get in R if you used tapply(AUDIT, Gender, mean). It can also be obtained
as follows from the information in the table.

10.600 * 20 + 6.233 * 30 =
212 + 187 =
399

Do you see what we've done here? When I teach this in Psyc 225, I tell the
students that we are "going back to the sums." If the mean is sum(X)/n, then
the sum is the mean times n. So the sum of the scores in the first row of the
table (females) is 399. Then to get the mean, we divide by n in the row.

399 / (20 + 30) = 7.98

This is called a weighted mean, or weighted marginal mean, because the means
in the individual cells are not just added together and divided by the number
of means. (That would work if the design were balanced!) Rather, the means in
the individual cells are multiplied by a weight, then added, and then the sum
is divided by the sum of the weights. (Yes, you were taught that in your first
stat course!) In this case, the weights are n in the cells.

Now you do the males row and the two columns. The answers are (don't look
until you've done the calculations!) 12.100, 11.370, and 7.767. Hint: the sums
in the cells have to be whole numbers. Why is that?

Unweighted Marginal Means

The unweighted marginal mean is the mean of the means. That is, we add the cell
means across a row (or down a column) and we divide by the number of means. For
the first row (females), the unweighted marginal mean is...

(10.600 + 6.233) / 2 = 8.417

Now you do the rest of them. The answers are 12.440, 12.086, and 8.771.

Do you see the problem? With an unbalanced design, the marginal means are
different depending upon how they are calculated. So which ones should we use
to calculate the main effects? You ask hard questions! That depends upon how
you want to calculate the ANOVA, and that depends upon what questions you are
asking of the data. These are the kinds of problems we are going to face when
it comes to unbalanced designs.

4) BONUS - Here is an alternate analysis of the audit results. Can you explain
what I'm doing? Send me an email with your answer. Generous bonus points for
anyone who gets it right!

          north          south         marginal mean
          ---------------------------
          M = 10.600     M = 6.233        7.980
females   SD = 7.126     SD = 4.621
          n = 20         n = 30           n = 50
          ---------------------------
          M = 13.571     M = 11.308      12.100
males     SD = 9.537     SD = 4.922
          n = 7          n = 13           n = 20
          ---------------------------
marginal    11.370         7.767
mean
            n = 27        n = 43

SS.error = 7.126^2 * 19 + 4.621^2 * 29 + 9.537^2 * 6 + 4.922^2 * 12 = 2420.512
MS.error = 2420.512 / (19 + 29 + 6 + 12) = 36.67442
sd.pooled = sqrt(36.67442) = 6.056

test on Gender:
t = (12.1 - 7.98) / (6.056 * sqrt(1/50 + 1/20)) = 2.571357, df = 66
p.value (from R) = 0.0124
adjusted p.value = 0.0124 * 2 = 0.0248 (reject the null)

test on Region:
t = (11.37 - 7.767) / (6.056 * sqrt(1/27 + 1/43)) = 2.422956, df = 66
p.value (from R) = 0.0181
adjusted p.value = 0.0181 * 2 = 0.0362 (reject the null)

What assumption have we made in doing these tests?

5) Here are the cell means from problem 2 above. There were n=30 subjects per
cell. Calculate the weighted and unweighted marginal means and confirm that
they are the same. Not just similar, EXACTLY the same!

> with(X, tapply(rating,list(gender,author),mean))
         female     male     none
female 3.700000 3.733333 3.366667
male   3.433333 3.433333 3.566667

The marginal means are:
> with(writing, tapply(rating,list(gender),mean))
  female     male 
3.600000 3.477778 
> with(writing, tapply(rating,list(author),mean))
  female     male     none 
3.566667 3.583333 3.466667 

6) meningitis

file = "http://ww2.coastal.edu/kingw/psyc480/data/meningitis.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)

These data are assessments of recovery from brain damage. The DV (scores) is
an assessment from a battery of tests designed to detect abnormal brain function.
The IVs are type of illness that caused the brain damage (illness) and time
post-recovery from the illness (time). If there is an interaction, graph it and
describe it in words. Sorry, but I don't have a reference for the source of
these data. I think I took them out of an old stat book.

> aov.meningitis = aov(scores ~ illness * time, data=X)
> summary(aov.meningitis)
             Df Sum Sq Mean Sq F value   Pr(>F)    
illness       2 2413.6  1206.8  14.224 6.04e-05 ***
time          2  536.2   268.1   3.160 0.058460 .  
illness:time  4 2194.4   548.6   6.466 0.000874 ***
Residuals    27 2290.8    84.8                     
---

The interaction plot can be drawn this way:
> with(X, interaction.plot(time, illness, scores, type="b", pch=c(1,2,19)))
Scores rise over time postoperatively but fall after meningitis or encephalitis.

7) odd monkeys

file = "http://ww2.coastal.edu/kingw/psyc480/data/oddmonkeys.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)

Data are from monkeys performing an oddity task in which they are
exposed to three objects, two of which are the same. Their task is
to learn that a reward is hidden under the odd object. Reward consists
of one grape (low), three grapes (moderate), or five grapes (high).
Monkeys were randomly assigned to these conditions. In addition, the
motivation of the monkeys to learn the task was manipulated by food
depriving them for either one hour before doing the task (weak) or
24 hrs (strong). Monkeys were also randomly assigned to these conditions.
A total of 30 learning trials were administered. The DV, successes, is
the number of times out of the 30 that a monkey made the correct choice.
Source: W.L. Hays (1963). Statistics for Psychologists.

> aov.oddmonkeys = aov(successes ~ reward * motivation, data=X)
> summary(aov.oddmonkeys)
                  Df Sum Sq Mean Sq F value Pr(>F)  
reward             2    112   56.00   3.055 0.0721 .
motivation         1     24   24.00   1.309 0.2675  
reward:motivation  2    144   72.00   3.927 0.0384 *
Residuals         18    330   18.33                 
---

The interaction is described in the interactions lecture.

8) sex

file = "http://ww2.coastal.edu/kingw/psyc480/data/sexbalanced.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)

I thought that might get your attention! Data are from Lindsey Chopyk's Psyc 497
project (Fall 2009). Lindsey administered a survey asking participants about
their sexual history, practices, and opinions. Part of the survey involved a
fictional person who the participant was asked for opinions about. Lindsey's
design was mildly unbalanced. I have made it balanced by randomly throwing out
a few subjects in some of the cells. This did not change the effects that are
seen in the ANOVA. Notice that this is a three-factor design (2x2x2 factorial).
Can you name the effects that will be evaluated before R gives you the answer.
The variables are:
    Partic.Gend: gender of the participant in this study
    Story.Gend: gender of fictional person participant was asked for opinion about
    Age.Grp: age group of participant (young vs. old)
    Slut: how many sex partners may the fictional person have before being
          considered to be a slut? (Her term, not mine!)
The survey was administered in a local mall near CCU. The participants were not
CCU students. However, the young age group consisted of people of roughly college
age, while the old age group consisted of people in their 50s or older. The
relationhship between Age.Grp, Partic.Gend, Story.Gend, and Slut is fascinating!
Can you summarize that relationship in words?

> aov.sex = aov(Slut ~ Partic.Gend * Story.Gend * Age.Grp, data=X)
> summary(aov.sex)
                               Df Sum Sq Mean Sq F value   Pr(>F)    
Partic.Gend                     1   49.8    49.8   2.738  0.10125    
Story.Gend                      1  193.9   193.9  10.650  0.00153 ** 
Age.Grp                         1  620.3   620.3  34.075 7.19e-08 ***
Partic.Gend:Story.Gend          1  193.9   193.9  10.650  0.00153 ** 
Partic.Gend:Age.Grp             1   41.9    41.9   2.301  0.13260    
Story.Gend:Age.Grp              1  129.4   129.4   7.107  0.00901 ** 
Partic.Gend:Story.Gend:Age.Grp  1  311.5   311.5  17.113 7.55e-05 ***
Residuals                      96 1747.7    18.2                     
---

The three-way interaction is the highest order interaction that is significant
and so is the effect of interest. Graph it this way. Begin by getting an idea
of what the means are like.
> with(X,tapply(Slut,list(Partic.Gend,Story.Gend,Age.Grp),mean))
, , old

            men    women
Female 4.384615 3.153846
Male   3.769231 4.000000

, , young

             men    women
Female  6.769231 8.000000
Male   15.615385 4.461538

Then partition the graphics window to hold two graphs.

> par(mfrow=c(1,2))

Then draw two graphs with the same range on the y-axis. I've chosen to draw a
graph for each age group, but that is arbitrary. I think age group is the
choice that tells the best story, however. (You can also draw these by hand
from the means above and get a pretty good idea what's going on.)

> with(X[X$Age.Grp=="old",],interaction.plot(Partic.Gend,Story.Gend,Slut,type="b",pch=c(2,16),ylim=c(2,16)))
> title(main="Older")
> with(X[X$Age.Grp=="young",],interaction.plot(Partic.Gend,Story.Gend,Slut,type="b",pch=c(2,16),ylim=c(2,16)))
> title(main="Younger")

The legends need some work, but you can get the idea. The line for the younger
males tell the story.