R Tutorials--Related Measures t Test

RELATED MEASURES t TEST

Syntax

The syntax for the t.test() function is given here from the help page in R.

## Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

## S3 method for class 'formula':
t.test(formula, data, subset, na.action, ...)

"S3" refers to the S language (version 3), which is often the same as the methods and syntax used by R. In the case of the t.test() function, there are two alternative syntaxes, the default, and the "formula" syntax. Both syntaxes are relevant to the two-sample t-tests. The default syntax requires two data vectors, "x" and "y", to be specified. To get the dependent measures t-test, the option "paired=" must be set to TRUE, which is not the default. The "alternative=" option is set by default to "two.sided" but can be set to any of the three values shown above. The default null hypothesis is "mu = 0", which in this case should be read as "mu1-mu2=0". This is usually what we want, but doesn't have to be, and should be changed by the user if this is not the null hypothesis. The rest is either irrelevant to this tutorial or can be ignored for the moment.

The t Test With Two Dependent Groups

Dependent groups can be the same subjects used again (repeated measures), or they can be matched samples. Either way, the t-test is performed on the difference scores and amounts to little more than a single sample t-test. A normal distribution of difference scores is strongly encouraged, unless the sample is large enough that you can hide behind the central limit theorem and claim a normal sampling distribution of means of the differences, in which case the t-test is robust to violations of the normality assumption.

If you are doing the test on the difference scores, see the Single Sample t Test tutorial. If the groups are represented by two vectors (or columns in a data frame), then it is just a matter of setting the "paired=" option in the t.test() function to TRUE. One caution: If the data are two vectors in your workspace, you need to remember that the scores are paired, i.e., score 1 in the "x" vector is paired with score 1 in the "y" vector, score 2 with score 2, etc. The scores must be kept in the correct "paired" order in the two vectors.

> data(anorexia, package="MASS")       # weight gain (lbs.) in anorexic women
> attach(anorexia)
> str(anorexia)
'data.frame':   72 obs. of  3 variables:
 $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
 $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
 $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
> ft = subset(anorexia, subset=(Treat=="FT"))         # just the family therapy threatment
> ft
   Treat Prewt Postwt
56    FT  83.8   95.2
57    FT  83.3   94.3
58    FT  86.0   91.5
59    FT  82.5   91.9
60    FT  86.7  100.3
61    FT  79.6   76.7
62    FT  76.9   76.8
63    FT  94.2  101.6
64    FT  73.4   94.9
65    FT  80.5   75.2
66    FT  81.6   77.8
67    FT  82.1   95.5
68    FT  77.6   90.7
69    FT  83.5   92.5
70    FT  89.9   93.8
71    FT  86.0   91.7
72    FT  87.3   98.0
> detach(anorexia)
> rm(anorexia)

The anorexia data frame has been retrieved from the "MASS" package, and the data corresponding to the Family Therapy treatment have been extracted in a new data frame called "ft" (and we cleaned up after ourselves). A note: in the subset() function, "subset=" is the second argument in the default syntax, so this command could have been written somewhat more simply and logically as: ft=subset(anorexia,Treat="FT"). The data frame has to be attached to do this. Otherwise, you must use anorexia$Treat="FT".

At this point, "ft" is set up in such a way that a test on the difference scores would be easy.

> t.test(Postwt-Prewt, mu=0, data=ft, alternative="greater")
Error in t.test(Postwt - Prewt, mu = 0, data = ft, alternative = "greater") : 
  object "Postwt" not found

EXCEPT there is no "data=" option unless we are using the formula interface. Drat!

> with(ft, t.test(Postwt-Prewt, mu=0, alternative="greater"))        # we could also attach(ft)

        One Sample t-test

data:  Postwt - Prewt 
t = 4.1849, df = 16, p-value = 0.0003501
alternative hypothesis: true mean is greater than 0 
95 percent confidence interval:
 4.233975      Inf 
sample estimates:
mean of x 
 7.264706

The null hypothesis is rejected at any reasonable alpha level. The 95% CI tells us the true mean difference is 4.23 or more with 95% confidence, and the sample mean difference is reported as 7.26 lbs. I.e., women receiving family therapy for anorexia gained, on average, 7.26 pounds during the treatment period. If you want a different confidence level, set that with the "conf.level=" option.

The same result will be obtained from the dependent t-test.

> with(ft, t.test(Postwt, Prewt, paired=T, alternative="greater"))

        Paired t-test

data:  Postwt and Prewt 
t = 4.1849, df = 16, p-value = 0.0003501
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 4.233975      Inf 
sample estimates:
mean of the differences 
               7.264706

In the first version of the test, the single-sample version, we let R do the subtraction to get the difference scores right inside the t.test() function. No use in creating a new data object that we will just have to discard anyway. In the second version, we listed the two vectors individually with a comma between them. This works as long as X_i in the first vector corresponds to Y_i in the second vector, and as long as the "paired=T" option is set. (Otherwise, we will get an independent t-test.) Notice R subtracts first group listed minus second group listed. This determined that the alternative should be set as "greater" in this case, since we expect the patients to gain weight during the treatment period.

A formula interface is also available for instances where the data frame is arranged in long form. Our current data frame is very simple. It contains two columns of scores representing the two times at which patients were measured (and a useless Treat column). Getting such a data frame into long form is relatively easy.

> ft$Treat = NULL                      # delete unnecessary columns
> ft.long = stack(ft)
> ft.long
   values    ind
1    83.8  Prewt
2    83.3  Prewt
3    86.0  Prewt
4    82.5  Prewt
5    86.7  Prewt
6    79.6  Prewt
7    76.9  Prewt
8    94.2  Prewt
9    73.4  Prewt
10   80.5  Prewt
11   81.6  Prewt
12   82.1  Prewt
13   77.6  Prewt
14   83.5  Prewt
15   89.9  Prewt
16   86.0  Prewt
17   87.3  Prewt
18   95.2 Postwt
19   94.3 Postwt
20   91.5 Postwt
21   91.9 Postwt
22  100.3 Postwt
23   76.7 Postwt
24   76.8 Postwt
25  101.6 Postwt
26   94.9 Postwt
27   75.2 Postwt
28   77.8 Postwt
29   95.5 Postwt
30   90.7 Postwt
31   92.5 Postwt
32   93.8 Postwt
33   91.7 Postwt
34   98.0 Postwt

For the t-test, that's all we need. (If there were more than two groups, or two measurement times, this would be more complex.) In such a data frame, however, the subjects' scores must remain paired. I.e., the first Prewt must correspond to--be from the same (or possibly matched) subject--as the first Postwt, the second Prewt with the second Postwt, etc. If that's NOT the case, then we would need a subject identifier in the data frame, AND we would need to do a different test. We could also rename the columns if we want to, but it's not mandatory.

> t.test(values ~ ind, paired=T, alternative="less", data=ft.long)   ### THIS IS WRONG!

	Paired t-test

data:  values by ind
t = 4.1849, df = 16, p-value = 0.9996
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
     -Inf 10.29544
sample estimates:
mean of the differences 
               7.264706

We got a little confused about the order in which the subtraction would be done. When R is fed a factor as part of a formula, it arranges the factor levels in alphabetical order. A summary will confirm this.

> summary(ft.long)
     values           ind    
 Min.   : 73.40   Postwt:17  
 1st Qu.: 80.78   Prewt :17  
 Median : 86.35              
 Mean   : 86.86              
 3rd Qu.: 93.47              
 Max.   :101.60

And R subtracts first level minus second level, i.e., in this case, Postwt minus Prewt. That would make the correct alternative "greater". (The order in which the factor levels occur in the data frame is irrelevant.)

> t.test(values ~ ind, paired=T, alternative="greater", data=ft.long)

	Paired t-test

data:  values by ind
t = 4.1849, df = 16, p-value = 0.0003501
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 4.233975      Inf
sample estimates:
mean of the differences 
               7.264706

There we go. But, WARNING, the factor levels don't have to be seen by R in alphabetical order! So it's always a good idea to check.

> levels(ft.long$ind)
[1] "Postwt" "Prewt"

Why might they be in some other order? Because WE, the humans, put them in some other order. Unlike a lot of modern software, R does what it's told to do by the humans who are using it! Well... usually it does! Don't do what I'm about to do!

>   ### WARNING: DO NOT DO THIS! THIS IS VERY BAD!! ###
> levels(ft.long$ind) = c("Prewt","Postwt")
[1] "Prewt"  "Postwt"
> summary(ft.long)
     values           ind    
 Min.   : 73.40   Prewt :17  
 1st Qu.: 80.78   Postwt:17  
 Median : 86.35              
 Mean   : 86.86              
 3rd Qu.: 93.47              
 Max.   :101.60

The joke is certainly on me! Turns out the levels() function, not only relevels the variable, it also relabels it. I.e., it flips the "ind" vector without flipping the "values" vector! That is very bad and surely a bug (and if it isn't, it certainly should be). So what is the "correct" way to do it? Okay, you can do this one, but be very careful with your typing. (And remember, if you have the data frame attached, detach it first.)

> ft.long$ind = factor(ft.long$ind, levels=c("Prewt","Postwt"))   ### WARNING: DON'T MISTYPE!
> ### Now the original test is the correct one!
> t.test(values ~ ind, paired=T, alternative="less", data=ft.long)

	Paired t-test

data:  values by ind
t = -4.1849, df = 16, p-value = 0.0003501
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -4.233975
sample estimates:
mean of the differences 
              -7.264706

> ### Because...
> levels(ft.long$ind)
[1] "Prewt"  "Postwt"
> ### ... the factor has been releveled without the data frame being altered!

When you are fooling with the levels of a factor, DON'T MISTYPE! A typing mistake here could wipe out your variable in the data frame. Take my word for that! I've done it! (Sadly, more than once!) Now the subtraction will be in the order Prewt minus Postwt, and the original test with alternative="less" is the correct one.

Confusing! But that is the price you pay for having repeated measures factors! Just be sure you know the order in which R is going to see your factor levels. You can do that with levels() WITHOUT AN ASSIGNMENT, or with summary().

Sane people would not organize their data frame as we have above. Instead, sane people would keep the data from each subject together on contiguous lines of the data frame, like this.

>   ### If you want to follow along, type the following to reorder the data frame. ###
> ft.long$subjects = rep(LETTERS[1:17],2)
> ft.long = ft.long[order(ft.long$subjects),]
> ft.long
   values    ind subjects
1    83.8  Prewt        A
18   95.2 Postwt        A
2    83.3  Prewt        B
19   94.3 Postwt        B
3    86.0  Prewt        C
20   91.5 Postwt        C
4    82.5  Prewt        D
21   91.9 Postwt        D
5    86.7  Prewt        E
22  100.3 Postwt        E
6    79.6  Prewt        F
23   76.7 Postwt        F
7    76.9  Prewt        G
24   76.8 Postwt        G
8    94.2  Prewt        H
25  101.6 Postwt        H
9    73.4  Prewt        I
26   94.9 Postwt        I
10   80.5  Prewt        J
27   75.2 Postwt        J
11   81.6  Prewt        K
28   77.8 Postwt        K
12   82.1  Prewt        L
29   95.5 Postwt        L
13   77.6  Prewt        M
30   90.7 Postwt        M
14   83.5  Prewt        N
31   92.5 Postwt        N
15   89.9  Prewt        O
32   93.8 Postwt        O
16   86.0  Prewt        P
33   91.7 Postwt        P
17   87.3  Prewt        Q
34   98.0 Postwt        Q

It doesn't matter as far as the t.test() function is concerned. That function will pair the first value it finds of Prewt with the first value it finds of Postwt, the second value with the second value, etc.

> levels(ft.long$ind)
[1] "Prewt"  "Postwt"
> t.test(values ~ ind, paired=T, alternative="less", data=ft.long)

	Paired t-test

data:  values by ind
t = -4.1849, df = 16, p-value = 0.0003501
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -4.233975
sample estimates:
mean of the differences 
              -7.264706

Of course, we should be pilloried in the public square for not doing a proper examination of our data beforehand.

> qqnorm(ft$Postwt - ft$Prewt)                   # output not shown
> qqline(ft$Postwt - ft$Prewt)
> plot(ft$Prewt, ft$Postwt)
> plot(ft$Prewt, Change <- ft$Postwt-ft$Prewt)   # cannot use = here!

We have a problem with normality in a small data set (n=17). We also have a problem with nonresponders (four women whose body weight didn't change or actually fell during therapy). And even among those who did change, the effect was not additive. Women who started off with lower body weights tended to change the most. These are all violations of the assumptions of the test.

Matched Samples

The previous example was a repeated measures design--i.e., the same subjects (experimental units) were being measured repeatedly. What if each subject is measured only once, but the subjects are paired up on a matching variable? Then we have a matched samples (or matched subjects or matched groups) design. I want to be sure to include one of these, because I'm going to refer to it in a later tutorial. The important thing to remember for now is that the analysis is the same.

For decades it's been suspected that schizophrenia involves anatomical abnormalities in the hippocampus, an area of the brain involved with memory. The following data bearing on this issue are from Suddath et al. (1990) and were used by (and obtained from) Ramsey and Schafer (3rd ed., 2013, p. 31. Display 2.2). The researchers obtained MRI measurements of the volume of the left hippocampus from 15 pairs of identical twins discordant for schizophrenia. The data are displayed in the following table. (You should be able to copy and paste the following lines to get the data into R.)

schizophrenia = read.table(header=T, text="
pair   affected   unaffected
1      1.27       1.94
2      1.63       1.44
3      1.47       1.56
4      1.39       1.58
5      1.93       2.06
6      1.26       1.66
7      1.71       1.75
8      1.67       1.77
9      1.28       1.78
10     1.85       1.92
11     1.02       1.25
12     1.34       1.93
13     2.02       2.04
14     1.59       1.62
15     1.97       2.08
")

Just eyeballing it leaves little doubt that the hippocampus is smaller in the affected cotwin--the "interocular trauma test" as my old stat prof (Tom Wickens) called it, because the result jumps out and hits you between the eyes. Most journal editors don't recognize this statistical technique, however, and would prefer that we cite the result of a t-test. Here it is.

> with(schizophrenia, t.test(affected, unaffected, paired=T, alternative="less"))

	Paired t-test

data:  affected and unaffected
t = -3.2289, df = 14, p-value = 0.003031
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
        -Inf -0.09029832
sample estimates:
mean of the differences 
             -0.1986667

Ramsey, Fred L. and Schafer, Daniel W. The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed.). Boston: Brooks/Cole Cengage, 2013. (Note: If anyone is interested in my opinion, this is one of the very best statistics books I've ever had the pleasure of encountering.)
Suddath, R. L., et al., (1990). Anatomical abnormalities in the brains of monozygotic twins discordant for schizophrenia. New England Journal of Medicine, 322(12), 789-793.

Alternatives to the Related Samples t Test

The primary alternative to the paired t-test when normality is in question has historically been the Wilcoxin signed ranks test, the syntax of which (from the help page) is very similar.

wilcox.test(x, y = NULL,
            alternative = c("two.sided", "less", "greater"),
            mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
            conf.int = FALSE, conf.level = 0.95, ...)

It will work either without or with the formula interface and also as a single sample test.

>    ### single sample case
> with(ft, wilcox.test(Postwt-Prewt, mu=0, alternative="greater"))

        Wilcoxon signed rank test

data:  Postwt - Prewt 
V = 142, p-value = 0.0004196
alternative hypothesis: true location is greater than 0 

>    ### two-sample case without the formula interface
> with(ft, wilcox.test(Postwt, Prewt, alternative="greater", paired=T))

        Wilcoxon signed rank test

data:  Postwt and Prewt 
V = 142, p-value = 0.0004196
alternative hypothesis: true location shift is greater than 0 

>    ### two-sample case with the formula interface
> wilcox.test(values ~ ind, paired=T, alternative="less", data=ft.long)

	Wilcoxon signed rank test

data:  values by ind
V = 11, p-value = 0.0004196
alternative hypothesis: true location shift is less than 0

>    ### with a confidence interval
> wilcox.test(values ~ ind, paired=T, alternative="less", data=ft.long, conf.int=T)

	Wilcoxon signed rank test

data:  values by ind
V = 11, p-value = 0.0004196
alternative hypothesis: true location shift is less than 0
95 percent confidence interval:
  -Inf -4.05
sample estimates:
(pseudo)median 
         -7.65

In older versions of R, there was no "data=" option in the wilcox.test() function. That appears to have been corrected in newer versions.

revised 2016 January 30