| Table of Contents
| Function Reference
| Function Finder
| R Project |
ABOUT THE DATA SETS
Most of the data used in these tutorials are real data obtained from one
source or another.
Except for some short data objects that you can conveniently type in at
the command prompt, all of the data sets used in these tutorals are described
in the following list. Most of these are "built in" to R. That is, they come
with R when you download it, and are located in the "datasets" library, which
loads by default when R is started, making the data directly available to you.
These data sets are described as "built in" in the following list. A
few others are in other libraries, such as "MASS", that you can access as
described in the tutorials. For example, the "anorexia" data set can be put
into your workspace via data(anorexia,
package="MASS"). "Built in" data simply require
data(airquality), for example.
That leaves a number of data sets that are not directly available within R.
I will point out which these are in the following list and describe how they
can be obtained below.
Data Sets Used In These Tutorials
- airquality - daily air quality measurements in New York City (1973); built in
- anorexia - body weights of anorexic women before and after therapy; in the MASS library
- birthwt - birthweights of babies related to various factors about the mother; in the MASS library
- caffeine - finger-tapping rate by dose of caffeine; available online via
read.csv() as described in the tutorial
- Cars93 - data on 1993 model cars on sale in the U.S.; in the MASS library
- cats - anatomical measurements from domestic cats; in the MASS library
- cement - heat evolved during setting of cements; in the MASS library
- crabs - carapace length in blue crabs; in the MASS library
- EMG - electromyographic data from left forehead during emotional arousal; entered using
read.table()
- Eysenck's data - data from an experiment on human memory; entered from the keyboard
- ChickWeight - effect of diet on the early growth of chicks; built in
- chickwts - weights of baby chickens by feed type; built in
- CO2 - experiment on the cold tolerance of a species of grass; built in
- faithful - eruption times and waiting times for Old Faithful geyser; built in
- genotype - data from a crossfostering experiment with rats; in the MASS library
- gorilla - can you see a gorilla that is right in front of you?; entered using
read.table()
- groceries - grocery prices by store; entered using read.table()
- InsectSprays - effectiveness of various insecticides in an agricultural setting; built in
- HairEyeColor - hair and eye color by gender; built in
- Insurance - claims made by car insurance policy holders related to age
and type of car; in the MASS library
- islands - a vector of areas of world's major land masses; built in
- loneliness - variables related to loneliness; available online via read.csv()
- mammals - average brain and body weights for 62 species of land mammals; in the MASS library
- match (rowe.txt) - effect of background music on a memory task; entered using
read.table()
- menarche - age at menarche in a cohort of Polish girls; in the MASS library
- mj.data - marijuana use and short term memory; entered from the keyboard
- mtcars - cars road tested by Motor Trend magazine (1974); built in
- Myers - data demonstrating blocking from Myers' textbook; entered using read.table()
- NELS - National Educational Longitudinal Study; from Timothy Keith's website
- normtemp - human body temperature measurements; available online via
read.table() as described in tutorial
- Orange - data on the growth of orange trees; built in
- OrchardSprays - effect of orchard treatments on honeybee activity; built in
- planets - data on planets of the solar system; entered using read.table()
- PlantGrowth - growth of plants by treatment type; built in
- pressure - vapor pressure of mercury related to temperature; built in
- Rabbit - effect of a serotonin receptor blocker on blood pressure; in the MASS library
- react - reaction time and task type; available online via read.csv()
as described in the tutorial
- rivers - a vector of N. American river lengths; built in
- RoundingTimes - base running times by method of rounding bases; available via
example(friedman.test)
- scar - body cutting and scarification and self-esteem; available online only
via read.csv() as described in tutorial
- schizophrenia - schizophrenia and hippocampal size in MZ twins; entered using
read.table()
- Seatbelts - effect of compulsory wearing of seatbelts in the U.K.; built in
- sexab - effect of childhood abuse on adult PTSD; from Julian Faraway's website
- sleep - increase in sleep time resulting from a "sleeping pill"; built in
- sparrows - nesting behavior of house sparrows related to human foot traffic;
entered using read.table()
- state.region - census regions for U.S. states; built in
- state.x77 - a matrix containing various data about U.S. states (1977); built in
- sunspots - monthly sunspot numbers (1749-1983); built in
- survey - survey data from University of Adelaide students; in the MASS library
- Titanic - survival of Titanic passengers by age, sex, and class of ticket held; built in
- ToothGrowth - tooth growth in guinea pigs by type and dose of vitamin C; built in
- UCB and UCBdf - objects created from the following data set
- UCBAdmissions - admission to grad programs at U.C. Berkeley by program and
gender (1973); built in
- ucla - relationships among reading, writing, math, and science; from UCLA IDRE website
- USArrests - arrests for violent crimes in the U.S. (1973); built in
- warpbreaks - number of warp breaks per loom by type of wool and tension; built in
- women - average weight and average height of a group of women; built in
- yields - crop yields by type of fertilization; from Michael Crawley's website
Inline Data Entry
Suppose we wanted to enter the following data.
items storeA storeB storeC storeD
lettuce 1.17 1.78 1.29 1.29
potatoes 1.77 1.98 1.99 1.99
milk 1.49 1.69 1.79 1.59
eggs 0.65 0.99 0.69 1.09
bread 1.58 1.70 1.89 1.89
cereal 3.13 3.15 2.99 3.09
ground.beef 2.09 1.88 2.09 2.49
tomato.soup 0.62 0.65 0.65 0.69
laundry.detergent 5.89 5.99 5.99 6.99
aspirin 4.46 4.84 4.99 5.15
In the Data Frames tutorial, I describe a
method of data entry that I refer to as "inline data entry." You should read
that now, if you haven't already. It's about 2/3 rds of the way through the
tutorial and is fairly short.
The gist of it is this. It's not hard to type data vectors into R using the
scan() function, but if you want to enter an entire
data frame at one go, that's another matter. You can do it, though, by opening
a script window (File > New Script in Windows; File > New Document on a Mac)
and typing exactly the following.
groceries = read.table(header=T, text="
items storeA storeB storeC storeD
lettuce 1.17 1.78 1.29 1.29
potatoes 1.77 1.98 1.99 1.99
milk 1.49 1.69 1.79 1.59
eggs 0.65 0.99 0.69 1.09
bread 1.58 1.70 1.89 1.89
cereal 3.13 3.15 2.99 3.09
ground.beef 2.09 1.88 2.09 2.49
tomato.soup 0.62 0.65 0.65 0.69
laundry.detergent 5.89 5.99 5.99 6.99
aspirin 4.46 4.84 4.99 5.15
")
It's probably best to do spacing with the spacebar if you're in Windows. I
sometimes have trouble getting the Windows version of R to recognize tabs as
white space (even though the help page says it should). It doesn't seem to
matter if you're working on a Mac. Once you get it typed, execute the
script, and the data are in your workspace in an object called "groceries".
(To execute a script, in Windows, go to the Edit menu, and choose Run all.
On a Mac, highlight the whole thing in the script window with your mouse, go
to the Edit menu, and choose Execute.)
Above, you noticed that some of the data sets say they are "entered using
read.table()." That refers to this method of
"inline" data
entry. The good news is, you can copy and paste that. You don't actually have
to type it. You can paste it into a script window and execute it. Or you can
paste it directly into the R Console at a command prompt. Most of the data
sets that do not come with the R download can be entered in that fashion. If
they cannot be, then there are further instructions as to how to get them in
the tutorial.
But They Are Also Online
Most of those data sets are also available online. Here is a list of them.
Sometimes you are told in the tutorial how to retrieve them, but sometimes not,
so I'll tell you here, as soon as I get the list done.
- caffeine.csv (The caffeine data in a csv file.)
- scar.csv (Available online ONLY as a csv file.)
- EMG.txt (The EMG data in a csv file.)
- gorilla.csv and gorilla.txt (The gorilla data in a csv file.)
- groceries.txt (The groceries data in a table.)
- loneliness.csv (The loneliness data in a csv file.)
- normtemp.txt (The normtemp data in a table, with no headers.)
- planets.txt (The planets data in a csv file.)
- react.txt (The react data in a csv file.)
- rowe.txt (The match data in a csv file.)
- schizophrenia.txt (The schizophrenia data in a csv file.)
- sparrows.txt (The sparrows data in a csv file.)
Two of the data files say they are in a "table." That means the data values
are separated by white space (as above), and the data must be read using the
read.table() function. The details for the
"normtemp" data
are discussed at length in the tutorial where they are used. To get the
"groceries" data, do this.
> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/groceries.txt"
> groceries = read.table(file, header=T) # or to put items in row names, do...
> groceries = read.table(file, header=T, row.names=1)
The rest are CSV files (comma separated values). The caffeine.csv file is
in that form expressly to illustrate CSV files and their creation and is
discussed in the tutorial where it is relevant. Soooo, if the others are CSV
files, how come they end with a .txt extension? Because not all browsers will
allow you to view CSV files but will insist that you download them. However,
they are just plain text files, and if they end in a .txt extension, you
should be able to view them in your browser. Try clicking on this link to view the
groceries data:
groceries as a text file. It's a text file, so you'll have to click the
back arrow in your browser's menu bar to get back here.
That is a table, with data values separated by white space. Click on this
link to see the gorilla data: gorilla
data in a CSV file. That's a CSV file with data values separated by commas
and no white space. But you can still view it if you want to. For all of these
data files, the web address is:
http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/filename.txt
Where "filename.txt" should be replaced by the name given in the list above. If
you copy and paste that into your browser's address bar and then change
filename.txt to, say, react.txt, you should see that file come up in your
browser window. And at that point, you can save the page to your own computer
if you want to.
To read any of those files directly from within R, do this.
> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/filename.txt"
> ### EXCEPT--change filename.txt to the actual data file name you want
> dataname = read.csv(file)
For example...
> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/react.txt"
> react = read.csv(file)
That will retrieve the data set for you as long as you have an Internet
connection and our server is up. If R gives you an error message that looks
like this...
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '404 Not Found'
...then either one of those things isn't true, or you mistyped. Check your
typing carefully. And don't blame the long filename on me. I'm not
responsible for Internet protocols.
Or you can download the whole ball of wax (all twelve files and some
bonus scripts) as a zip file
right here.
created 2016 February 17; updated 2016 March 22
| Table of Contents
| Function Reference
| Function Finder
| R Project |
|