R Tutorials by William B. King, Ph.D.
| Table of Contents | Function Reference | Function Finder | R Project |

ABOUT THE DATA SETS


Most of the data used in these tutorials are real data obtained from one source or another. Except for some short data objects that you can conveniently type in at the command prompt, all of the data sets used in these tutorals are described in the following list. Most of these are "built in" to R. That is, they come with R when you download it, and are located in the "datasets" library, which loads by default when R is started, making the data directly available to you. These data sets are described as "built in" in the following list. A few others are in other libraries, such as "MASS", that you can access as described in the tutorials. For example, the "anorexia" data set can be put into your workspace via data(anorexia, package="MASS"). "Built in" data simply require data(airquality), for example.

That leaves a number of data sets that are not directly available within R. I will point out which these are in the following list and describe how they can be obtained below.


Data Sets Used In These Tutorials

  • airquality - daily air quality measurements in New York City (1973); built in
  • anorexia - body weights of anorexic women before and after therapy; in the MASS library
  • birthwt - birthweights of babies related to various factors about the mother; in the MASS library
  • caffeine - finger-tapping rate by dose of caffeine; available online via read.csv() as described in the tutorial
  • Cars93 - data on 1993 model cars on sale in the U.S.; in the MASS library
  • cats - anatomical measurements from domestic cats; in the MASS library
  • cement - heat evolved during setting of cements; in the MASS library
  • crabs - carapace length in blue crabs; in the MASS library
  • EMG - electromyographic data from left forehead during emotional arousal; entered using read.table()
  • Eysenck's data - data from an experiment on human memory; entered from the keyboard
  • ChickWeight - effect of diet on the early growth of chicks; built in
  • chickwts - weights of baby chickens by feed type; built in
  • CO2 - experiment on the cold tolerance of a species of grass; built in
  • faithful - eruption times and waiting times for Old Faithful geyser; built in
  • genotype - data from a crossfostering experiment with rats; in the MASS library
  • gorilla - can you see a gorilla that is right in front of you?; entered using read.table()
  • groceries - grocery prices by store; entered using read.table()
  • InsectSprays - effectiveness of various insecticides in an agricultural setting; built in
  • HairEyeColor - hair and eye color by gender; built in
  • Insurance - claims made by car insurance policy holders related to age and type of car; in the MASS library
  • islands - a vector of areas of world's major land masses; built in
  • loneliness - variables related to loneliness; available online via read.csv()
  • mammals - average brain and body weights for 62 species of land mammals; in the MASS library
  • match (rowe.txt) - effect of background music on a memory task; entered using read.table()
  • menarche - age at menarche in a cohort of Polish girls; in the MASS library
  • mj.data - marijuana use and short term memory; entered from the keyboard
  • mtcars - cars road tested by Motor Trend magazine (1974); built in
  • Myers - data demonstrating blocking from Myers' textbook; entered using read.table()
  • NELS - National Educational Longitudinal Study; from Timothy Keith's website
  • normtemp - human body temperature measurements; available online via read.table() as described in tutorial
  • Orange - data on the growth of orange trees; built in
  • OrchardSprays - effect of orchard treatments on honeybee activity; built in
  • planets - data on planets of the solar system; entered using read.table()
  • PlantGrowth - growth of plants by treatment type; built in
  • pressure - vapor pressure of mercury related to temperature; built in
  • Rabbit - effect of a serotonin receptor blocker on blood pressure; in the MASS library
  • react - reaction time and task type; available online via read.csv() as described in the tutorial
  • rivers - a vector of N. American river lengths; built in
  • RoundingTimes - base running times by method of rounding bases; available via example(friedman.test)
  • scar - body cutting and scarification and self-esteem; available online only via read.csv() as described in tutorial
  • schizophrenia - schizophrenia and hippocampal size in MZ twins; entered using read.table()
  • Seatbelts - effect of compulsory wearing of seatbelts in the U.K.; built in
  • sexab - effect of childhood abuse on adult PTSD; from Julian Faraway's website
  • sleep - increase in sleep time resulting from a "sleeping pill"; built in
  • sparrows - nesting behavior of house sparrows related to human foot traffic; entered using read.table()
  • state.region - census regions for U.S. states; built in
  • state.x77 - a matrix containing various data about U.S. states (1977); built in
  • sunspots - monthly sunspot numbers (1749-1983); built in
  • survey - survey data from University of Adelaide students; in the MASS library
  • Titanic - survival of Titanic passengers by age, sex, and class of ticket held; built in
  • ToothGrowth - tooth growth in guinea pigs by type and dose of vitamin C; built in
  • UCB and UCBdf - objects created from the following data set
  • UCBAdmissions - admission to grad programs at U.C. Berkeley by program and gender (1973); built in
  • ucla - relationships among reading, writing, math, and science; from UCLA IDRE website
  • USArrests - arrests for violent crimes in the U.S. (1973); built in
  • warpbreaks - number of warp breaks per loom by type of wool and tension; built in
  • women - average weight and average height of a group of women; built in
  • yields - crop yields by type of fertilization; from Michael Crawley's website

Inline Data Entry

Suppose we wanted to enter the following data.

items              storeA  storeB  storeC  storeD
lettuce              1.17    1.78    1.29    1.29
potatoes             1.77    1.98    1.99    1.99
milk                 1.49    1.69    1.79    1.59
eggs                 0.65    0.99    0.69    1.09
bread                1.58    1.70    1.89    1.89
cereal               3.13    3.15    2.99    3.09
ground.beef          2.09    1.88    2.09    2.49
tomato.soup          0.62    0.65    0.65    0.69
laundry.detergent    5.89    5.99    5.99    6.99
aspirin              4.46    4.84    4.99    5.15

In the Data Frames tutorial, I describe a method of data entry that I refer to as "inline data entry." You should read that now, if you haven't already. It's about 2/3 rds of the way through the tutorial and is fairly short.

The gist of it is this. It's not hard to type data vectors into R using the scan() function, but if you want to enter an entire data frame at one go, that's another matter. You can do it, though, by opening a script window (File > New Script in Windows; File > New Document on a Mac) and typing exactly the following.

groceries = read.table(header=T, text="
items              storeA  storeB  storeC  storeD
lettuce              1.17    1.78    1.29    1.29
potatoes             1.77    1.98    1.99    1.99
milk                 1.49    1.69    1.79    1.59
eggs                 0.65    0.99    0.69    1.09
bread                1.58    1.70    1.89    1.89
cereal               3.13    3.15    2.99    3.09
ground.beef          2.09    1.88    2.09    2.49
tomato.soup          0.62    0.65    0.65    0.69
laundry.detergent    5.89    5.99    5.99    6.99
aspirin              4.46    4.84    4.99    5.15
")
It's probably best to do spacing with the spacebar if you're in Windows. I sometimes have trouble getting the Windows version of R to recognize tabs as white space (even though the help page says it should). It doesn't seem to matter if you're working on a Mac. Once you get it typed, execute the script, and the data are in your workspace in an object called "groceries". (To execute a script, in Windows, go to the Edit menu, and choose Run all. On a Mac, highlight the whole thing in the script window with your mouse, go to the Edit menu, and choose Execute.)

Above, you noticed that some of the data sets say they are "entered using read.table()." That refers to this method of "inline" data entry. The good news is, you can copy and paste that. You don't actually have to type it. You can paste it into a script window and execute it. Or you can paste it directly into the R Console at a command prompt. Most of the data sets that do not come with the R download can be entered in that fashion. If they cannot be, then there are further instructions as to how to get them in the tutorial.


But They Are Also Online

Most of those data sets are also available online. Here is a list of them. Sometimes you are told in the tutorial how to retrieve them, but sometimes not, so I'll tell you here, as soon as I get the list done.

  • caffeine.csv (The caffeine data in a csv file.)
  • scar.csv (Available online ONLY as a csv file.)
  • EMG.txt (The EMG data in a csv file.)
  • gorilla.csv and gorilla.txt (The gorilla data in a csv file.)
  • groceries.txt (The groceries data in a table.)
  • loneliness.csv (The loneliness data in a csv file.)
  • normtemp.txt (The normtemp data in a table, with no headers.)
  • planets.txt (The planets data in a csv file.)
  • react.txt (The react data in a csv file.)
  • rowe.txt (The match data in a csv file.)
  • schizophrenia.txt (The schizophrenia data in a csv file.)
  • sparrows.txt (The sparrows data in a csv file.)

Two of the data files say they are in a "table." That means the data values are separated by white space (as above), and the data must be read using the read.table() function. The details for the "normtemp" data are discussed at length in the tutorial where they are used. To get the "groceries" data, do this.

> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/groceries.txt"
> groceries = read.table(file, header=T)    # or to put items in row names, do...
> groceries = read.table(file, header=T, row.names=1)

The rest are CSV files (comma separated values). The caffeine.csv file is in that form expressly to illustrate CSV files and their creation and is discussed in the tutorial where it is relevant. Soooo, if the others are CSV files, how come they end with a .txt extension? Because not all browsers will allow you to view CSV files but will insist that you download them. However, they are just plain text files, and if they end in a .txt extension, you should be able to view them in your browser. Try clicking on this link to view the groceries data: groceries as a text file. It's a text file, so you'll have to click the back arrow in your browser's menu bar to get back here.

That is a table, with data values separated by white space. Click on this link to see the gorilla data: gorilla data in a CSV file. That's a CSV file with data values separated by commas and no white space. But you can still view it if you want to. For all of these data files, the web address is:

http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/filename.txt

Where "filename.txt" should be replaced by the name given in the list above. If you copy and paste that into your browser's address bar and then change filename.txt to, say, react.txt, you should see that file come up in your browser window. And at that point, you can save the page to your own computer if you want to.

To read any of those files directly from within R, do this.

> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/filename.txt"
> ### EXCEPT--change filename.txt to the actual data file name you want
> dataname = read.csv(file)
For example...
> file = "http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/react.txt"
> react = read.csv(file)
That will retrieve the data set for you as long as you have an Internet connection and our server is up. If R gives you an error message that looks like this...
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '404 Not Found'
...then either one of those things isn't true, or you mistyped. Check your typing carefully. And don't blame the long filename on me. I'm not responsible for Internet protocols.

Or you can download the whole ball of wax (all twelve files and some bonus scripts) as a zip file right here.


created 2016 February 17; updated 2016 March 22
| Table of Contents | Function Reference | Function Finder | R Project |