Unbalanced Practice

Psyc 480 -- Dr. King

Unbalanced Factorial ANOVA Practice Problems

INSTRUCTIONS. Check for balance, construct a table of means, calculate both the weighted and unweighted marginal means. From the marginal means try to determine what main effects are present. (Of course, you'll be looking at the cell means to see any interaction effect.) Check the variances. Then do the appropriate ANOVA (justify your choice) and interpret the results. Calculate effect sizes.

If you decide you need the aovIII() function, here is how you get it.

source("http://ww2.coastal.edu/kingw/psyc480/functions/aovIII.R")

Problem 1) These data are from Angelia Hackett's Psyc 497 project, Spring 1997. The variables are:

gender: male, female
age: older (legal drinking age), younger (<21)
drink: score on the Student Alcohol Questionnaire (scale: 1-6)

Higher scores on drink indicate more drinking. 1 = nondrinker or rarely drinks, 6 = frequent binge drinker. Subjects were students from CCU. Angelia proposed a relationship between drinking and self-esteem, which she did not find, but it's interesting to look at drink~gender*age.

file = "http://ww2.coastal.edu/kingw/psyc480/data/SAQ.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

You will learn something new when you look at the answer to this problem, so don't miss that!

Problem 2) Tom Prin's firemen data again. Look at the full factorial structure this time.

file = "http://ww2.coastal.edu/kingw/psyc480/data/firemen.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

Data from Tom Prin (Psyc 497, Spring 2005). Tom tested firemen from three areas of the country, Horry County, SC, Charleston, SC, and New York City, recording their score on the Rotter Internal/External inventory (low scores indicate more internal), and the firemen's risk rating: A=risk taker, B=gets the job done w/o taking too many risks, and C=generally unwilling to take risks. The risk rating can also be expressed in terms of willingness to engage in "meritorious behavior." Tom's hypothesis was that risk A firemen were most willing to take risks because of of an external locus of control ("When my time is up, my time is up."). Why is this an inappropriate hypothesis if an ANOVA is the intended analysis?

Problem 3) Back to Kellie Dunlaps Psyc 497 data once more, now that we have the tools to handle it property.

file = "http://ww2.coastal.edu/kingw/psyc480/data/audit.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

# AUDIT stands for Alcohol Use Disorders Identification Test. Scores on this
# test greater than 8 supposedly indicate the possible existence of "problem
# drinking." Kellie Dunlap used the AUDIT as the dependent measure in her Psyc
# 497 project (Fall 2008). 70 CCU students were scored on the test. Other
# information she collected from her subjects was gender and home state.
# variables:
#    sex - gender of subject
#    south - whether or not the subject was from a state south of the
#            Mason-Dixon line
#    AUDIT - AUDIT score

We are interested in how sex and south relate to AUDIT.

Problem 4) These data are from Elizabeth Ostop, Psyc 497, Spring 2010. Variables are:

SAD: score on Social Avoidance and Distress Scale (high=more distress)
SHY: based on score on Cheek and Buss Shyness Scale (high=more shyness), subject was classified as SHY=yes or SHY=no
LOC: based on score on Rotter's Locus of Control Scale (high=external), subject was classified as LOC=internal or LOC=external

My theory is that that social avoidance and distress are caused by both shyness and external locus of control, but that the effect of locus of control is indirect. When LOC is external, it makes one feel as if the outcome of a social interaction is beyond one's control and, hence, shyness develops. Shyness, in turn causes social avoidance and distress. Test my theory using ANOVA.

file = "http://ww2.coastal.edu/kingw/psyc480/data/SADcat.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

Problem 5) Wages from 1985 Current Population Survey. We will be looking for a relationship between Gender, Occupation, and Wages.

file = "http://ww2.coastal.edu/kingw/psyc480/data/wages4.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

Problem 6) Unbalanced monkeys. It's difficult to come up with a true experiment that is unbalanced, so we are going to unbalance one. We're going to fetch the oddmonkeys data and then toss out one of the monkeys at random. Who knows what happened to this monkey? Maybe it was sick. Maybe it escaped (unlikely). Maybe the equipment broke down while it was being tested. Maybe it was being tested by a first-year grad student who didn't follow the testing protocol correctly. The rule is, stuff happens! Sometimes, even when we make our best effort to keep our experimental design balanced, it becomes unbalanced by accident and at random.

Let me talk a minute about what sort of thing would not be "at random." When I was a grad student working on my masters thesis research, I was having a problem that I was discussing with my thesis adviser one morning in the elevator. I said, "The problem is the subjects in one of my treatment groups keep dying." At that point, another professor standing behind us said, "I sure hope you are working with rats!" I was altering the brain chemistry of rats in two different ways and then testing their ability to acquire a taste aversion. One of the treatments was apparently a little too much for a couple of the rats in that group.

That is NOT "at random." If subjects are leaving your experiment because the task you've given them is too hard, too unpleasant, too harsh, or too fatal, that's not "at random." Equipment breaking down, on the other hand, which results in a subject's data having to be discarded, can be considered a random loss of data.

If you have a factorial design that begins balanced, but then you lose subjects "by accident and at random" so that the design becomes mildly unbalanced, that's when you use Type III sums of squares.

file = "http://ww2.coastal.edu/kingw/psyc480/data/oddmonkeys.txt"
X = read.table(file=file, header=T, stringsAsFactors=T)
summary(X)

Recall that this is a learning experiment in which the monkeys must learn to solve an oddity task for a reward of grapes. Two variables are being manipulated (IVs): amount of reward and strength of motivation (length of food deprivation). If you table reward and motivation, you'll discover there are n=4 monkeys per group. Now we're going to lose one at random.

lost.monkey = sample(1:24, 1)   # choose one number at random from 1 to 24
X = X[-lost.monkey,]   # delete that monkey from the data frame

I should warn you. The lost.monkey was chosen at random, so your lost.monkey is probably not the same as my lost.monkey. Your results will be a little different from mine.

Have at it!