
SINGLE SAMPLE t TEST
Syntax
The syntax for the t.test( ) function is given here from the help page
in R...
## Default S3 method:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
## S3 method for class 'formula':
t.test(formula, data, subset, na.action, ...)
"S3" refers to the S language (version 3), which is often the same as the
methods and syntax used by R. In the case of the t.test( ) function, there
are two alternative syntaxes, the default, and the "formula" syntax. The
formula syntax will be discussed in the following two tutorials on the
two-sample t-tests. The default syntax requires a data vector, "x", to be
specified. The "alternative=" option is set by default to "two.sided" but can
be set to any of the three values shown above. The default null hypothesis is
"mu = 0", which should be changed by the user if this is not your null
hypothesis. The rest is either irrelevant to this tutorial or can be ignored
for the moment.
Hey! What Happened to the z Test?
The z-tests have not been implimented in the default R packages, although
they have been included in an optional, add-on library called "UsingR." (See
the Package Management tutorial for details on how
to add this library to R.)
The t Test With a Single Sample
What is normal human body temperature (taken orally)? We've all been taught
since grade school that it's 98.6 degrees Fahrenheit, and never mind that
what's normal for one person may not be "normal" for another! So from a
statistical point of view, we should abandon the word "normal" and confine
ourselves to talking about mean human body temperature. We hypothesize that
mean human body temperature is 98.6 degrees, because that's what we were told
in the third grade. The data set "Normal Body Temperature, Gender, and Heart
Rate" bears on this hypothesis. It is not built-in, so we will have to enter
the data. The data are from a random sample (supposedly) of 130 cases and has
been posted at the Journal of Statistical Education's data
archive. The original source of the data is Mackowiak, P. A., Wasserman,
S. S., and Levine, M. M. (1992). A Critical Appraisal of 98.6 Degrees F, the
Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold
August Wunderlich. Journal of the American Medical Association,
268, 1578-1580. Direct your browser here to see the data:
normtemp.txt.
> ### Comment: Holy crap! I don't wanna type all that in!
> ### Response: You don't have to. Go back to that link, and with your mouse,
> ### highlight and then copy the entire table. Then do this...
> normtemp = scan()
> ### When the prompt appears for your first entry, paste the table, and hit
> ### the Enter key enough times (once or twice) at the end to get back to
> ### the R command prompt. Then do this...
> index = seq(1,388,3)
> degreesF = normtemp[index]
> degreesF
[1] 96.3 96.7 96.9 97.0 97.1 97.1 97.1 97.2 97.3 97.4 97.4 97.4
[13] 97.4 97.5 97.5 97.6 97.6 97.6 97.7 97.8 97.8 97.8 97.8 97.9
[25] 97.9 98.0 98.0 98.0 98.0 98.0 98.0 98.1 98.1 98.2 98.2 98.2
[37] 98.2 98.3 98.3 98.4 98.4 98.4 98.4 98.5 98.5 98.6 98.6 98.6
[49] 98.6 98.6 98.6 98.7 98.7 98.8 98.8 98.8 98.9 99.0 99.0 99.0
[61] 99.1 99.2 99.3 99.4 99.5 96.4 96.7 96.8 97.2 97.2 97.4 97.6
[73] 97.7 97.7 97.8 97.8 97.8 97.9 97.9 97.9 98.0 98.0 98.0 98.0
[85] 98.0 98.1 98.2 98.2 98.2 98.2 98.2 98.2 98.3 98.3 98.3 98.4
[97] 98.4 98.4 98.4 98.4 98.5 98.6 98.6 98.6 98.6 98.7 98.7 98.7
[109] 98.7 98.7 98.7 98.8 98.8 98.8 98.8 98.8 98.8 98.8 98.9 99.0
[121] 99.0 99.1 99.1 99.2 99.2 99.3 99.4 99.9 100.0 100.8
That's one way to do it anyway. You should now be looking at the vector of
body temperatures, which we have stored in the data object "degreesF". IF that
worked, which it doesn't always. So if it didn't, here's another way to get
copied and pasted tables into R...
> normtemp = read.table(stdin())
0: ### Now paste the copied table here.
...and as above, hit the Enter key enough (once or twice) to get back to the R
command prompt. The table will be entered in table form using this trick, so
now the body temp data must be extracted...
> names(normtemp)
[1] "V1" "V2" "V3"
> degreesF = normtemp$V1
> degreesF ### output not shown
Once again, you should now be looking at the vector of body temperatures. If
not, then you should also be able to copy and paste the data into a spreadsheet,
from which you can then copy the first column and paste it into scan( ).
If you can't get that to work, start typing!
The t-test assumes a random sample of independent values has been obtained
from a normal parent distribution. We should check the normality assumption...
> qqnorm(degreesF) ### output not shown
> qqline(degreesF) ### output not shown
> plot(density(degreesF)) ### output not shown
> shapiro.test(degreesF) ### output not shown
Hmmm, the distribution appears to be close to normal, and the Shapiro-Wilk test
does not detect a significant deviation from normality. In any event, the t-test
is robust to nonnormality as long as the sample is large enough to invoke the
central limit theorem and say the sampling distribution of means is normal. So
we procede to the t-test...
> t.test(degreesF, mu=98.6, alternative="two.sided")
One Sample t-test
data: degreesF
t = -5.4548, df = 129, p-value = 2.411e-07
alternative hypothesis: true mean is not equal to 98.6
95 percent confidence interval:
98.12200 98.37646
sample estimates:
mean of x
98.24923
Note: setting the alternative to "two.sided" was unnecessary, since that is the
default.
We can now reject the null at any reasonable alpha level we might have
chosen! From the sample, we might estimate the mean human body temperature to
be 98.25 degrees (sample mean on the last line of output). A 95% CI lets us be
95% sure the population mean is between 98.12 and 98.38 degrees. If some other
degree of confidence is desired for this CI, it can be set using the
"conf.level=" option. For example...
> t.test(degreesF, conf.level=.99)$conf.int
[1] 98.08111 98.41735
attr(,"conf.level")
[1] 0.99
Here we have asked just for the 99% confidence interval to be reported using the
"$conf.int" index on the end of our procedure. And even that doesn't allow us
to conclude that our third grade teachers got it right!
What happened? It's a bit like being told there is no Santa Claus! It seems
the original measurements were made in degrees Celsius and reported to the
nearest degree. Mean human body temperature is not 98.6° Fahrenheit. It
is 37° Celsius. Someone got a bit carried away with significant digits
when he converted this value to the Fahrenheit scale!
Textbook Problems
In textbook problems, we are often not given the raw data but only summary
statistics. R does not provide a mechanism for dealing with this, other than
doing the calculations by hand at the command line...
A random sample of 130 human beings was taken, and the oral body
temperature of each was measured. The sample mean was 98.25 degrees
Fahrenheit, with a standard deviation of 0.7332. Test the null
hypothesis that the mean human body temperature is 98.6 degrees.
> t.obt = (98.25 - 98.6) / (.7332 / sqrt(130))
> t.obt
[1] -5.442736
> qt(c(.025,.975),df=129) ### critical values, alpha=.05
[1] -1.978524 1.978524
> 2 * pt(t.obt,df=129) ### two-tailed p-value
[1] 2.547478e-07
A custom function could be written to automate these calculations.
Power
A power function exists for calculating the power of a t-test. Its syntax
is...
power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05,
power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "one.sided"),
strict = FALSE)
> power.t.test(n=130, delta=98.6-98.25, sd=.7332, sig.level=.05,
+ type="one.sample", alternative="two.sided")
One-sample t test power calculation
n = 130
delta = 0.35
sd = 0.7332
sig.level = 0.05
power = 0.9997111
alternative = two.sided
I'd say power was not an issue in this case!
An Alternative: The Single-Sample Sign Test of the Median
If the distribution of scores is skewed, and the sample size is small (less
than 30), then the t-test should not be used. An alternative is the
single-sample sign test, which really boils down to a single-sample test of a
proportion...
> data(anorexia, package="MASS")
> attach(anorexia)
> str(anorexia)
'data.frame': 72 obs. of 3 variables:
$ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
$ Prewt : num 80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
$ Postwt: num 80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
> weight.gain.CBT = Postwt[Treat==CBT] - Prewt[Treat==CBT]
Error: object "CBT" not found
> ### Comment: Bugger!
> weight.gain.CBT = Postwt[Treat=="CBT"] - Prewt[Treat=="CBT"]
> boxplot(weight.gain.CBT)
> ### Outliers all over the place!
> weight.gain.CBT
[1] 1.7 0.7 -0.1 -0.7 -3.5 14.9 3.5 17.1 -7.6 1.6 11.7 6.1 1.1 -4.0 20.9
[16] -9.1 2.1 -1.4 1.4 -0.3 -3.7 -0.8 2.4 12.6 1.9 3.9 0.1 15.4 -0.7
> ### Excellent! Everybody changed. We now wish to test the null hypothesis
> ### that cognitive behavior therapy produces no change in median body weight
> ### when used in the treatment of anorexia.
> length(weight.gain.CBT)
[1] 29
> sum(weight.gain.CBT > 0)
[1] 18
> ### There are 29 cases in the data set, of whom 18 showed a gain. The null
> ### implies that as many women should lose weight as gain if CBT is valueless.
> ### I.e., the null implies a median weight change of zero.
> binom.test(x=18, n=29, p=1/2, alternative="greater")
Exact binomial test
data: 18 and 29
number of successes = 18, number of trials = 29, p-value = 0.1325
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.4512346 1.0000000
sample estimates:
probability of success
0.6206897
> detach(anorexia)
The sign test assumes a continuous variable, as it does not deal very well with
data values that happen to fall at the null hypothesized median. (They should
probably just be omitted.) It's also not an especially powerful test, as it
tosses out almost all the numerical information in the data. However, it makes
no distribution assumptions, and for that reason alone is a useful tool.
Another Alternative: The Single-Sample Wilcoxin Test
If the dependent variable is continuous and appears to be symmetrically
distributed, then more of the numerical information in the sample can be
retained by using the Wilcoxin test instead of the sign test. The syntax is
very similar to the t-test...
wilcox.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
conf.int = FALSE, conf.level = 0.95, ...)
The following example is based on the one on the help page for this function...
> ## Hollander & Wolfe (1973), 29f.
> ## Hamilton depression scale factor measurements in 9 patients with
> ## mixed anxiety and depression, taken at the first (x) and second
> ## (y) visit after initiation of a therapy (administration of a
> ## tranquilizer).
> x = c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
> y = c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
> change = y - x ### expected to be negative
> change
[1] -0.952 0.147 -1.022 -0.430 -0.620 -0.590 -0.490 0.080 -0.010
> wilcox.test(change, mu=0, alternative = "less")
Wilcoxon signed rank test
data: change
V = 5, p-value = 0.01953
alternative hypothesis: true location is less than 0
Technically, this could also be considered a two-sample paired test, as...
wilcox.test(x, y, paired = TRUE, alternative = "greater")
...would have given the same result.
Return to the Table of Contents
|