PSYC 480 -- Dr. King Answers to Practice Problems Problem 1) SAQ.txt file = "http://ww2.coastal.edu/kingw/psyc480/data/SAQ.txt" X = read.table(file=file, header=T, stringsAsFactors=T) summary(X) gender age drink female:86 older : 67 Min. :1.00 male :83 younger:102 1st Qu.:2.00 Median :4.00 Mean :3.71 3rd Qu.:5.00 Max. :6.00 # check for balance with(X, table(gender, age)) age gender older younger female 33 53 male 34 49 # cell means with(X, tapply(drink, list(gender, age), mean)) older younger female 3.030303 3.584906 male 3.764706 4.265306 # save this table cell.means = with(X, tapply(drink, list(gender, age), mean)) # weighted marginal means with(X, tapply(drink, list(gender), mean)) female male 3.372093 4.060241 with(X, tapply(drink, list(age), mean)) older younger 3.402985 3.911765 # here's how to get the unweighted marginal means in R apply(cell.means, 1, mean) # row marginals female male 3.307604 4.015006 apply(cell.means, 2, mean) # column marginals older younger 3.397504 3.925106 # The weighted and unweighted marginal means aren't much different because # there isn't much confounding in this case. # check the cell variances with(X, tapply(drink, list(gender, age), var)) older younger female 2.467803 2.708999 male 2.245989 2.907313 # Unbalanced designs can become very sensitive to violations of assumptions, # but it looks like we are okay here, at least as far as the homogeneity of # variance assumption is concerned. # This is a quasi-experimental design and, therefore, I flinch at using Type III # tests here (although most people probably would). I would suggest Type II SSes. # In R, there is no option that allows Type II SSes to be calculated unless you # download and install an optional package such as "car". In a two-factor ANOVA, # however, we can trick R into giving us Type IIs. Recall from the lab exercise # that Type I and Type II do the same thing when testing the second factor # entered into the ANOVA. Therefore, the trick is to do two Type Is with the # factors entered in different orders and take the second line from each of # those as the Type II tests. Watch. aov.gender_then_age = aov(drink ~ gender * age, data=X) summary(aov.gender_then_age) Df Sum Sq Mean Sq F value Pr(>F) gender 1 20.0 20.001 7.613 0.00645 ** <- Type I age 1 11.3 11.256 4.284 0.04003 * <- Type I, Type II gender:age 1 0.0 0.029 0.011 0.91580 <- Type I, Type II, Type III Residuals 165 433.5 2.627 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Okay, stop right there! I've labeled the effects in this aov() output to # indicate that the first line (gender) is what you would get in a Type I # analysis, the second line (age) is what you would get in both the Type I and # Type II analyses, and the third line (interaction) is what you would get in # all three types. That will always be true in a two-factor analysis. (If there # are more factors, well, let's not even think about it! Let your software # worry about it!) # # The thing I want to point out in this case is that the interaction is almost # zero. That's about as much 'no interaction' as you'll ever see in real data. # Why can't we just drop that interaction term? I.e., why do we need to include # almost 'no interaction' in the analysis at all? The answer is, we can drop it, # but only if we are using Type I or Type II SSes. Read this carefully! In a # two-factor design, if you drop the interaction term, the main effects will # not change if you are using Type I or Type II SSes. I.e., the SSes will stay # the same. That's not true in a Type III analysis and, therefore, the interaction # term should never be dropped from a Type III analysis. If the interaction is # close to zero, and it is dropped in a Type I or Type II analysis, the SSes for # the main effects will stay the same. However, the tests on those main effects # will change slightly, because what used to be the interaction is combined # with the error term. df.error and SS.error will increase accordingly. # # Now let me show you a little trick. If the interaction term is near zero, and # you're going to drop it in a Type II analysis, use the aovIII() function. source("http://ww2.coastal.edu/kingw/psyc480/functions/aovIII.R") # if necessary aovIII(drink ~ gender + age, data=X) # + instead of * means drop the interaction Single term deletions Model: drink ~ gender + age Df Sum of Sq RSS AIC F value Pr(>F) 433.54 165.21 gender 1 20.790 454.33 171.13 7.9603 0.005365 ** age 1 11.256 444.79 167.54 4.3099 0.039432 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Those are Type II SSes! So here is your summary of these results. (It reamins # to figure out df.error, which is N - k + the interaction df. Don't forget # that. You have dumped the interaction term into the error term, so the error # term now claims what was df.interaction. N = 169, k = 4, df.interaction = 1, so...) > ### Type II sums of squares, interaction dropped > ### significant main effect of gender, F(1,166) = 7.9603, p = 0.005365 > ### significant main effect of age, F(1,166) = 4.3099, p = 0.039432 # If you're not sure what df.error is, you can always see it by using aov(). # Done? No! Effect sizes! > SS = function(x) sum(x^2) - sum(x)^2 / length(x) # if necessary > SS(X$drink) [1] 464.7929 > SS.tot = SS(X$drink) > 20.790 / SS.tot # eta-squared for gender [1] 0.0447296 > 11.256 / SS.tot # eta-squared for age [1] 0.02421724 > ### both main effects are small # We're not doing post hoc tests, but why would we not need them here anyway? Problem 2) firemen.txt file = "http://ww2.coastal.edu/kingw/psyc480/data/firemen.txt" X = read.table(file=file, header=T, stringsAsFactors=T) summary(X) Rotter Area Risk Min. : 4.00 Charleston:25 A:26 1st Qu.: 8.00 Horry :25 B:33 Median :10.00 NYC :25 C:16 Mean : 9.96 3rd Qu.:12.00 Max. :16.00 # check for balance with(X, table(Area, Risk)) Risk Area A B C Charleston 7 14 4 Horry 6 12 7 NYC 13 7 5 # Not even close. I can't think of any reason to use Type I SSes here. The idea # that Area causes Risk or, even sillier, Risk causes Area, doesn't seem very # plausible to me. At the same time, the severely unbalanced nature of the # design would cost us a lot of power if we used Type III SSes. Besides, the # study was quasi-experimental, and the cell frequencies probably are not # meaningless. This looks like a job for Type II SSes. But let's not put the # cart before the donkey. First we want to look at means and variances. # cell means cell.means = with(X, tapply(Rotter, list(Area, Risk), mean)) cell.means A B C Charleston 8.000000 9.857143 11.75000 Horry 9.833333 10.250000 11.71429 NYC 9.076923 10.857143 9.60000 # weighted marginal means with(X, tapply(Rotter, list(Area), mean)) Charleston Horry NYC 9.64 10.56 9.68 with(X, tapply(Rotter, list(Risk), mean)) A B C 8.961538 10.212121 11.062500 # unweighted marginal means apply(cell.means, 1, mean) Charleston Horry NYC 9.869048 10.599206 9.844689 apply(cell.means, 2, mean) A B C 8.970085 10.321429 11.021429 # Does it appear that Tom's hypothesis, that Risk A firemen are more external, # is correct? (No.) What's wrong with testing such a hypothesis with ANOVA? # (It's directional and ANOVA doesn't do directional.) # check the group variances with(X, tapply(Rotter, list(Risk, Area), var)) Charleston Horry NYC A 7.333333 2.166667 15.243590 B 4.593407 12.204545 5.809524 C 9.583333 6.904762 4.300000 # Do they make you unhappy? They make me unhappy! Nevertheless... # Type II sums of squares coming up summary(aov(Rotter ~ Area * Risk, data=X)) # first step: Area first Df Sum Sq Mean Sq F value Pr(>F) Area 2 13.5 6.760 0.805 0.451 Risk 2 41.5 20.767 2.474 0.092 . Area:Risk 4 23.9 5.968 0.711 0.587 Residuals 66 554.0 8.393 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Stop! What do you see that is bothersome? What I see is a non-zero interaction. # True, it's nonsignificant, but what's the rule? Nonsignficiant does not mean # nonexistent. Some people would object to the use of Type II SSes with a # non-zero interaction. I don't really see it causing that much of a problem, # but since we're not going to find any significant effects here anyway, we # might as well keep the journal editors happy and use Type III. Most people # would have used Type III anyway. Why? Because most people use SPSS, and that's # what SPSS does by default. (Besides, a journal editor is never going to see # this because there are no significant effects to report, no matter how you # do the ANOVA.) source("http://ww2.coastal.edu/kingw/psyc480/functions/aovIII.R") # if necessary aovIII(Rotter ~ Risk * Area, data=X) # does order of Risk and Area matter? (no) $contrasts [1] "contr.sum" "contr.poly" Single term deletions Model: Rotter ~ Risk * Area Df Sum of Sq RSS AIC F value Pr(>F) 553.96 167.97 Risk 2 43.644 597.60 169.66 2.5999 0.08187 . Area 2 8.106 562.06 165.06 0.4829 0.61916 Risk:Area 4 23.870 577.83 163.13 0.7110 0.58733 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 $contrasts [1] "contr.treatment" "contr.poly" # Are any post hoc tests necessary (if we were doing them)? (No. Why not?) # Effect sizes. # We can get SS.total by creating our SS() function and then finding the SS of # Rotter. However, the first table above is a Type I ANOVA, and the SSes in a # Type I ANOVA add to SS.total. So SS.total is... 13.5 + 41.5 + 23.9 + 554.0 = 632.9 # None of the Type III SSes (2nd ANOVA table) are even close to 10% of that, so # all of the effects are small. Problem 3) audit.txt file = "http://ww2.coastal.edu/kingw/psyc480/data/audit.txt" X = read.table(file=file, header=T, stringsAsFactors=T) summary(X) sex south AUDIT F:50 No :27 Min. : 0.000 M:20 Yes:43 1st Qu.: 4.000 Median : 9.000 Mean : 9.157 3rd Qu.:13.000 Max. :26.000 # check for balance (we know we don't have it, but we want to see how bad) with(X, table(sex, south)) south sex No Yes F 20 30 M 7 13 # cell means (do you see the main effects here? how about the interaction?) cell.means = with(X, tapply(AUDIT, list(sex, south), mean)) cell.means No Yes F 10.60000 6.233333 M 13.57143 11.307692 # weighted marginal means with(X, tapply(AUDIT, sex, mean)) F M 7.98 12.10 tapply(AUDIT, south, mean) No Yes 11.370370 7.767442 # unweighted marginal means apply(cell.means, 1, mean) F M 8.416667 12.439560 with(X, apply(cell.means, 2, mean)) No Yes 12.085714 8.770513 # variances with(X, tapply(AUDIT, list(sex,south), var)) No Yes F 50.77895 21.35747 M 90.95238 24.23077 # I don't see sex causing south or south causing sex, so I think Type II # sums of squares should be our choice here. aov.sex_then_south = aov(AUDIT ~ sex * south, data=X) summary(aov.sex_then_south) Df Sum Sq Mean Sq F value Pr(>F) sex 1 242.5 242.49 6.612 0.0124 * south 1 237.5 237.54 6.477 0.0133 * sex:south 1 14.6 14.59 0.398 0.5304 Residuals 66 2420.7 36.68 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Halt! I'll continue with this just to show you how it's done, but that # interaction effect is so small that I think we can get away with dropping # it, and that means we can use our aovIII() trick. aov.south_then_sex = aov(AUDIT ~ south * sex, data=X) summary(aov.south_then_sex) Df Sum Sq Mean Sq F value Pr(>F) south 1 215.3 215.30 5.870 0.01815 * sex 1 264.7 264.73 7.218 0.00912 ** south:sex 1 14.6 14.59 0.398 0.53041 Residuals 66 2420.7 36.68 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Type II sums of squares Df Sum Sq Mean Sq F value Pr(>F) south 1 237.5 237.54 6.477 0.0133 * <- from the 1st ANOVA table above sex 1 264.7 264.73 7.218 0.00912 ** <- from the 2nd ANOVA table above south:sex 1 14.6 14.59 0.398 0.53041 <- from either table Residuals 66 2420.7 36.68 <- from either table --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # By the way, in case I haven't said it already, when you do an ANOVA on an # unbalanced design, you MUST state how the ANOVA was done. You'll find a good # many articles in the literature where the authors fail to do this. # Now we'll do it using the aovIII() trick and write a summary of the effects. # Once again, if you are intent on using Type III SSes, you cannot drop the # interaction term, because that turns Type III into Type II. > aovIII(AUDIT ~ sex + south, data=X) $contrasts [1] "contr.sum" "contr.poly" Single term deletions Model: AUDIT ~ sex + south Df Sum of Sq RSS AIC F value Pr(>F) 2435.2 254.45 sex 1 264.73 2700.0 259.68 7.2835 0.008799 ** south 1 237.54 2672.8 258.97 6.5354 0.012848 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 $contrasts [1] "contr.treatment" "contr.poly" > ### Type II sums of squares, interaction term dropped > ### significant main effect of sex, F(1,67) = 7.2835, p = 0.008799 > ### significant main effect of south, F(1,67) = 6.5354, p = 0.012848 # Why 67 degrees of freedom for error? # Effect sizes. SS = function(x) sum(x^2) - sum(x)^2 / length(x) > SS(X$AUDIT) [1] 2915.271 > SS.tot = SS(X$AUDIT) > 264.73 / SS.tot # eta-squared for sex [1] 0.09080801 > 237.54 / SS.tot # eta-squared for south [1] 0.08148126 > ### both effects are small Problem 4) SADcat.txt file = "http://ww2.coastal.edu/kingw/psyc480/data/SADcat.txt" X = read.table(file=file, header=T, stringsAsFactors=T) summary(X) SAD LOC SHY Min. : 0.000 external:37 no :43 1st Qu.: 3.000 internal:35 yes:29 Median : 6.000 Mean : 6.764 3rd Qu.:10.250 Max. :17.000 # check for balance (how do we already know it isn't?) with(X, table(LOC, SHY)) SHY LOC no yes external 22 15 internal 21 14 # cell means (what do you see here?) > cell.means = with(X, tapply(SAD, list(LOC,SHY), mean)) > cell.means no yes external 5.727273 10.666667 internal 4.095238 8.214286 # weighted marginal means > with(X, tapply(SAD, list(LOC), mean)) external internal 7.729730 5.742857 > with(X, tapply(SAD, list(SHY), mean)) no yes 4.930233 9.482759 # unweighted marginal means > apply(cell.means, 1, mean) external internal 8.196970 6.154762 > apply(cell.means, 2, mean) no yes 4.911255 9.440476 # variances > with(X, tapply(SAD, list(LOC,SHY), var)) no yes external 23.06494 16.66667 internal 11.99048 21.25824 # I'm not too upset by that. All the variances are within two times each other. # We are testing the following theory: LOC -> SHY -> SAD aov.LOC_then_SHY = aov(SAD ~ LOC * SHY, data=X) summary(aov.LOC_then_SHY) Df Sum Sq Mean Sq F value Pr(>F) LOC 1 71.0 71.0 3.913 0.052 . SHY 1 357.2 357.2 19.686 3.43e-05 *** LOC:SHY 1 2.9 2.9 0.160 0.690 Residuals 68 1233.9 18.1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # LOC has a marginal effect on SAD. Let's see how much of that is direct. summary(aov(SAD ~ SHY * LOC, data=X)) # aov and summary all at once Df Sum Sq Mean Sq F value Pr(>F) SHY 1 359.0 359.0 19.782 3.29e-05 *** LOC 1 69.3 69.3 3.817 0.0549 . SHY:LOC 1 2.9 2.9 0.160 0.6900 Residuals 68 1233.9 18.1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Looks like virtually all of it. Our theory has failed. It appears that LOC # and SHY are both affecting SAD directly. The SSes don't change much when # the order the factors are entered is reversed. In other words, when SHY is # entered first, it doesn't steal any (hardly any) of LOC's variability. # What about effect sizes? SS.tot = 71 + 357.2 + 2.9 + 1233.9 71 / SS.tot # eta-sq for LOC [1] 0.04264264 357.2 / SS.tot # eta-sq for SHY [1] 0.2145345 > ### The Loc effect is small. The SHY effect is moderate. I'm not even going > ### to calculate for the interaction because it is virtually nonexistent. # Hint: Elizabeth's variables were actually all numeric. You'll see this # problem again! See shyness.txt. # AND NOW FOR SOMETHING COMPLETELY DIFFERENT (not really) # When I was in grad school, back when we considered a slide rule a high-tech # calculator (do you even know what a slide rule is?), unbalanced designs were # more of a nightmare than they are today with statistical software such as R # freely available to anyone with a desktop computer. (I took grad stat in # 1973/4. There were no desktop computers!) The advice I got from my stat prof # was this. If your design is just slightly unbalanced, and you have plenty of # subjects, randomly toss out a couple to balance the design. > with(X,table(LOC,SHY)) SHY LOC no yes external 22 15 internal 21 14 # That's not close to being balanced, but suppose we throw out AT RANDOM one # subject from the external-no group and one from the external-yes group. This # is going to involve some fancy R work, so you may not want to watch too # closely! The problem is, the X data frame is not in order, which is to say, # internals and externals and nos and yeses are scattered at random throughout # the data frame. The following command will put it in order. X = X[with(X, order(LOC, SHY)),] # you don't need to know how to do this rownames(X) = 1:72 # this renumbers the rows # Now all of the external subjects are in rows 1-37, with external-no subjects # in rows 1-22, and external-yes subjects in rows 23-37. We will randomly throw # out one of each. X = X[-sample(23:37,1),] # this one must be done first (can you see why?) X = X[-sample(1:22,1),] with(X, table(LOC,SHY)) SHY LOC no yes external 21 14 internal 21 14 # That was pretty gutsy, by the way, altering the data frame like that without # saving a backup copy first! So, the design still isn't balanced, so what's # the point of these shenanigans? What we have now is a condition called # "proportional balance." Proportional balance is NOT balance, but it does # have some magical properties. Watch this. > summary(aov(SAD ~ LOC * SHY, data=X)) Df Sum Sq Mean Sq F value Pr(>F) LOC 1 72.0 72.01 3.946 0.0511 . SHY 1 315.5 315.47 17.284 9.48e-05 *** LOC:SHY 1 0.8 0.77 0.042 0.8377 Residuals 66 1204.6 18.25 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > summary(aov(SAD ~ SHY * LOC, data=X)) Df Sum Sq Mean Sq F value Pr(>F) SHY 1 315.5 315.47 17.284 9.48e-05 *** LOC 1 72.0 72.01 3.946 0.0511 . SHY:LOC 1 0.8 0.77 0.042 0.8377 Residuals 66 1204.6 18.25 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # What do you notice? What happens if you dump that nonexistent interaction effect # into the error term? (Note: You may be tempted to try aovIII on this problem, # with the interaction term included. If you don't look very carefully, you may # conclude that aovIII gives the same answer as aov. It does not. It's very close, # but it's not the same. If the interaction effect were larger, it wouldn't even # be very close. Why is that?) Problem 5) wages4.txt file = "http://ww2.coastal.edu/kingw/psyc480/data/wages4.txt" X = read.table(file=file, header=T, stringsAsFactors=T) summary(X) Wages Gender Married Occupation Min. : 1.000 Female:167 Married :233 Clerical :70 1st Qu.: 5.500 Male :178 Unmarried:112 Management :40 Median : 8.000 Other :89 Mean : 9.334 Professions:87 3rd Qu.:12.000 Sales :22 Max. :26.290 Service :37 Region Union Degree Experience NotSouth:256 Nonunion:286 College :126 Min. : 0.00 South : 89 Union : 59 HighSchool:219 1st Qu.: 7.00 Median :14.00 Mean :16.61 3rd Qu.:24.00 Max. :46.00 Age Min. :18.0 1st Qu.:27.0 Median :34.0 Mean :36.3 3rd Qu.:43.0 Max. :64.0 # check for balance > with(X, table(Gender, Occupation)) # what happens if you enter these in opposite order? Occupation Gender Clerical Management Other Professions Sales Service Female 55 17 17 43 10 25 Male 15 23 72 44 12 12 # cell means (any chance we're looking at an interaction here? why?) cell.means = with(X, tapply(Wages, list(Gender,Occupation), mean)) cell.means Clerical Management Other Professions Sales Service Female 7.342727 9.277059 5.127647 11.63558 4.983000 5.948400 Male 6.860667 13.695652 9.424583 12.99841 9.880833 7.128333 # weighted marginal means with(X, tapply(Wages, list(Gender), mean)) Female Male 8.069461 10.519775 with(X, tapply(Wages, list(Occupation), mean)) Clerical Management Other Professions Sales Service 7.239429 11.817750 8.603820 12.324828 7.654545 6.331081 # unweighted marginal means apply(cell.means, 1, mean) Female Male 7.385736 9.998080 apply(cell.means, 2, mean) Clerical Management Other Professions Sales Service 7.101697 11.486355 7.276115 12.316995 7.431917 6.538367 # variances with(X, tapply(Wages, list(Gender,Occupation), var)) Clerical Management Other Professions Sales Service Female 8.007587 18.34443 2.405182 32.45002 1.603779 7.185889 Male 5.637735 53.17073 17.700436 33.10213 10.113736 15.362288 # That looks awful! That looks so bad that I think I want a Levene Test. I'll # show you how to use the function when you have a factorial design. source("http://ww2.coastal.edu/kingw/psyc480/functions/LeveneTest.R") with(X, LeveneTest(DV=Wages, IV=paste(Gender,Occupation))) Levene's Test for Equal Variances (center=mean) Df Sum Sq Mean Sq F value Pr(>F) IV 11 549.8 49.98 7.664 8.17e-12 *** Residuals 333 2171.6 6.52 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Looks like we are busted but good! The null hypothesis of equal variances is # rejected, p < .001. Unfortunately, there is little we can do about it but # continue. There is an alternative method, but it is beyond the scope of # this course. Violation of assumptions makes the p-values inaccurate, so we # can try to compensate by reducing the alpha level. I'm going to set alpha # at 0.001 for this problem. That way our p-values can be off by 50 times and # we are still okay. # I'm going to suggest Type I sums of squares, because I think it's fairly # obvious that occupation is at least in part "caused" by gender. Women are # more likely to be found in clerical jobs because they are women (at least in # 1985 and probably still today), and men are more likely to be found in the # "other" category (probably mostly laborers, construction workers, etc.) # because they are men. Therefore... aov.out = aov(Wages ~ Gender * Occupation, data=X) summary(aov.out) Df Sum Sq Mean Sq F value Pr(>F) Gender 1 517 517.3 26.408 4.72e-07 *** Occupation 5 1632 326.4 16.661 1.08e-14 *** Gender:Occupation 5 256 51.2 2.614 0.0246 * Residuals 333 6523 19.6 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # At alpha = 0.001, the two main effects are significant but the interaction is # not. (I am not comfortable with that, but we set the alpha level in advance # and now we are stuck with it.) So far, the result is consistent with the # "theory" that the influence of Gender on Wages is mediated by Occupation. # Let's see what happens when the IVs are entered in the opposite order. summary(aov(Wages ~ Occupation * Gender, data=X)) Df Sum Sq Mean Sq F value Pr(>F) Occupation 5 1775 355.0 18.125 6.57e-16 *** Gender 1 374 374.0 19.093 1.67e-05 *** Occupation:Gender 5 256 51.2 2.614 0.0246 * Residuals 333 6523 19.6 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Gender has a significant direct effect on Wages. The effect is reduced, # which is consistent with some mediation through Occupation, but it is not # eliminated. By the way, this does not SHOW or confirm mediation through # Occupation. It is merely consistent with it. This is a quasi-experimental # design, so talking about cause and effect would be very risky. # Actually, I'm not convinced this is showing anything at all, other than that # Gender and Occupation are both significantly related to Wages. In the second # table, Occupation does appear to steal a little of Gender's thunder, but not # really much. These changes could be nothing but random noise. I'm not convinced # that Occupation partially mediated the effect of Gender, even in 1985. # Messy problem, but then, that's the real world for you! Real problems do not # always come out as cleanly as textbook problems do. Problem 6) oddmonkeys.txt (unbalanced) file = "http://ww2.coastal.edu/kingw/psyc480/data/oddmonkeys.txt" X = read.table(header=T, stringsAsFactors=T, file=file) summary(X) successes reward motivation Min. : 0.00 high :8 strong:12 1st Qu.: 6.00 low :8 weak :12 Median : 9.50 moderate:8 Mean :10.00 3rd Qu.:14.25 Max. :18.00 source("http://ww2.coastal.edu/kingw/psyc480/functions/aovIII.R") # if necessary lost.monkey = sample(1:24, 1) # choose one number at random from 1 to 24 X = X[-lost.monkey,] # delete that monkey from the data frame # check for balance with(X, table(motivation, reward)) reward motivation high low moderate strong 4 3 4 weak 4 4 4 # relevel the variables because that's annoying X$motivation = relevel(X$motivation, ref="weak") X$reward = relevel(X$reward, ref="moderate") X$reward = relevel(X$reward, ref="low") # cell means (describe the interaction) cell.means = with(X, tapply(successes, list(motivation,reward), mean)) cell.means low moderate high weak 3.00000 10 14 strong 12.66667 12 10 # weighted marginal means > with(X, tapply(successes, motivation, mean)) weak strong 9.00000 11.45455 > with(X, tapply(successes, reward, mean)) low moderate high 7.142857 11.000000 12.000000 # unweighted marginal means (these are the ones that are tested by Type III SSes, # more or less - exactly the ones in a 2x2 design but not quite in a 2x3) apply(cell.means, 1, mean) weak strong 9.00000 11.55556 apply(cell.means, 2, mean) low moderate high 7.833333 11.000000 12.000000 # Why is only one of the unweighted marginal means different from the weighted # marginal means in each variable? # variances with(X, tapply(successes, list(motivation,reward), var)) low moderate high weak 10.000000 22.66667 15.33333 strong 6.333333 30.00000 16.66667 # not pretty but with groups this small we can expect a lot of random noise # ANOVA with Type III sums of squares aovIII(successes ~ motivation * reward, data=X) $contrasts [1] "contr.sum" "contr.poly" Single term deletions Model: successes ~ motivation * reward Df Sum of Sq RSS AIC F value Pr(>F) 296.67 70.814 motivation 1 37.123 333.79 71.525 2.1273 0.16293 reward 2 68.533 365.20 71.594 1.9636 0.17091 motivation:reward 2 172.533 469.20 77.357 4.9434 0.02031 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 $contrasts [1] "contr.treatment" "contr.poly" > ### Type III sums of squares > ### no significant main effect of motivation, F(1,17) = 2.1273, p = 0.16293 > ### no significant main effect of reward, F(2,17) = 1.9636, p = 0.17091 > ### significant interaction effect, F(2,17) = 4.9434, p = 0.02031 # Effect sizes. SS = function(x) sum(x^2) - sum(x)^2 / length(x) # if necessary > SS(X$successes) [1] 593.3043 > SS.total = SS(X$successes) > 172.533 / SS.total # effect size for the interaction [1] 0.2908002