Take my word for this. Not everything will make sense to you at first! If there is anything that puzzles you, don't worry. You will probably see it illustrated in the tutorials to come. R can't be learned "all at once." It will take repeated exposures, and (YO, STUDENTS!) practice. I will not be showing you much output in this tutorial, and that is quite intentional. If you want to see it, look at your open R Console.
A special warning to my students: I've been teaching a long time, and I know how hard it is to get you to practice things. These are your tools. It's your responsibility to learn to use them, and you're not going to do that by watching me or by skimming over a tutorial. Sit down at a computer, open R, and GET TO WORK! If you're not willing to put in the time, I suggest you find some other course to take, because you're not going to pass this one! I will NOT answer questions about R during the exams.
What is R?
First, there was S. The S statistical programming language was developed in the late 1970's, primarily by John Chambers at Bell Labs. S was first distributed outside of Bell Labs in 1980, and by 1988 the "New S Language" had become available. This was the basis for the commercial version called S-Plus. By 1990, S or S-Plus was in widespread use by statisticians.
In the early 1990's, Ross Ihaka and Robert Gentleman, of the University of Auckland in New Zealand, wrote a teaching version of S/S-Plus and named it R. In 1995 the source code for R was released as open source under the Gnu Public License. People who are unfamiliar with this concept are encouraged to click those links and read all about it. Since that time, R has become one of the most powerful and versatile statistical software programs available at any price.
R is a statistical computing environment. Primarily, it is a programming language, but one containing a very large number of statistical functions. These functions can be used to perform complex statistical analyses interactively, or they can be included in larger scripts and programs to accomplish even more complex tasks. R also has very elaborate graphical capabilities, allowing the production of publication-quality graphics.
R is free software. To get R, simply go to the R Project homepage and download it. Click on the CRAN link on the lefthand side of the page under downloads. (CRAN stands for Comprehensive R Archive Network.) R is available for Windows (Windows 95 and later), Mac OS X, and Linux. As of this writing (15 January 2016), the latest version is 3.2.3. A new version is released about once or twice a year.
Once you have R, you may do almost anything with it you please. You can install it on as many computers as you want. You can examine and modify the source code. You can even resell it, as long as you make the source code available. Unlike commerical software such as SPSS, you are not paying an exorbitant fee to "rent" a restricted-use copy. R is free!
The R Prompt
Start R like you would start any other program on your computer. In Windows, the R installer will place a shortcut icon on your desktop. On the Mac, you will have to drag an icon from your Applications folder to the dock. In Linux, R runs from a terminal window. You can also run R from either terminal.app or an X11 terminal on a Mac. I wouldn't recommend this in Windows, however.
R is command-line driven. This means you type your commands at a prompt rather than hunting and clicking through menus. This is much faster and more versatile than a GUI (graphical user interface), but it does take some getting used to for those of you who rarely take your hands off your mouse or get a kick from smearing your fingerprints all over your screen.
R GUIs are in development. Google up "R Commander" for example. The S-Plus commercial version also has a GUI, but be ready to pay dearly for it, in cash money, that is. (Note: S-Plus was at one time sold by Insightful, which is now owned by TIBCO and appears to have been rebranded as Spotfire S+.)
For those of you who haven't seen it previously in the install help or work-along demo tut, this is what R will look like (more or less--this is the Mac verison) when it opens. This window is called the R Console. (In Windows, it will have a larger gray window behind it. This is normal.)
The R prompt is the greater than symbol, >, at the bottom of that
window. When you see it, start
typing. If your command is a long one and breaks onto the next line, or
if you hit Enter without completing a command, R will prompt you on the
next line with a plus sign (+). This means "you ain't done yet--gimme
more." This often happens when you don't close a parenthesis. For
Notes: In the newer versions of R on the Mac, parentheses (and quotes) are closed for you. I.e., as soon as you type the left one, the right one also appears. This can be very convenient; in some cases, it can also be annoying. A common trick that programmers use is to type left and right parentheses at the same time, then backspace with the arrow keys and type whatever goes between them.
Further note: In some versions of R, especially those on Windows, when your typing reaches the right side of the window, R does not break the line but scrolls the window to the right. Once again, convenient and annoying. You can insert a break yourself by hitting the Enter key, if you prefer the window not to jump around. The Mac command editor will automatically break the line when it hits the right side of the editor window.
Final note: Sometimes you will get stuck at the + prompt. That's probably because you made a syntax error before the line broke, and now it can't be fixed by just typing more at the command prompt. You will have to abort the command and start over. This is done by pressing the Escape (Esc) key in the upper left corner of the keyboard. MAKE A NOTE OF THIS. You're going to need to know it eventually.
Best to know this right away, I suppose. To quit R, type...
The parentheses are mandatory, even though there is nothing inside them. R will ask if you wish to save your workspace. Say yes if you want to save any of the objects (data) you created while working in R. (See the next tutorial for "objects.")
R is case sensitive. "My_data" and "my_data" are not the same objects. Nor are Anova() and anova() the same functions (commands). The most common reason I get error messages is capitalizing where I shouldn't have, or not where I should have. If you get an "unknown object" or "function not found" error, check your capitalization first. And then check your spelling! The vast majority of R commands are all lower case (and must be typed that way!), but there are exceptions.
This might also be a good place to point out that R does not like spaces in the names of things. Use a dot or an underline character instead. Thus, "my.data" and "my_data" are fine (and different) names for a data set, but "my data" is not allowed. "MyData" and "myData" are also allowed (and different) names for data objects. But don't use a dash: "my-data". R will think you mean subtraction: "my" minus "data". You should follow these rules not only for named data objects you create while using R, but also in files you save with R. I.e., don't put spaces in filenames. R will usually work around it if you do, but it's just better to avoid the potential hassle to begin with.
R code, like any good programming code, can be commented. Any line or
partial line in R beginning with the hash or pound symbol (#) is a
comment, or note, and R will ignore it. You can use these to annotate
your analysis or make notes to yourself (or others). Type the following.
Remember to press the Enter key at the end of each line.
Almost every command you issue in R will take the form of a function.
Functions have the following syntax.
The parentheses are manditory, even if there are no arguments or
options given. For example...
A Taste--Before You Get Too Impatient!
If you haven't already, fire up R and get to typing. I'll illustrate a few more interesting and important features of R in these examples.
For educational purposes, R has a large number of data sets built into it. I will use these to illustrate a few things R can do. When you start R, you will get a screen something like this. This is the aforementioned R Console. It will be in it's own window on Mac and Windows. In Linux, R runs in the terminal.
R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] >
The R core team and R developers the world over are volunteers doing
this, well, maybe not entirely out of the goodness of their hearts, but
without much compensation. They would like you to cite them if you use
R for data analysis. To see how to do this, issue the following command.
To give you a bit of an idea of what R is capable of graphically, try this.
This little exercise has placed a lot of stuff in your workspace (to be
explained later). We should clean that up. WARNING: The command I am about to
tell you to execute will erase everything in your workspace, and that will be
permanent. If you've been using R already, and there is something in your
workspace you want to save, save it now!
As a brief appetite whetting, let's look at a data set called
"HairEyeColor". This is a table showing a crosstabulation of hair color, eye
color, and sex (gender) for 592 statistics students. Type this, and don't type
the command prompt--R has already supplied that for you. Remember to press
Enter to execute the command.
The null hypothesis that all factors are independent (i.e., no interactions between any factors) is rejected. If you are getting error messages instead of statistical output, remember: R is case sensitive. HairEyeColor and haireyecolor and NOT the same thing. (Note: The warning message "Chi-squared approximation may be incorrect" means there are expected frequencies less than 5.)
You are not to worry about the details of this syntax at this point. This
is just for show! To collapse over one or more of the factors in this table,
you can do one of these.
Suppose we wished to perform an ordinary (i.e., two-way) Pearson
chi-square test of independence on hair color and eye color just for
the men. Here's how to do it (and once again, just for show so don't worry about
memorizing all this syntax, which probably won't make much sense to you at this
> as.data.frame.table(HairEyeColor)Want to see proportions instead of raw frequencies?
> prop.table(HairEyeColor[,,1], margin=1) # relative to row marginal sums > prop.table(HairEyeColor[,,1], margin=2) # relative to column marginal sumsOkay, I think we've whetted ourselves enough for now. There are a few more details we need to cover before we can get down to some data analysis.
There is one more consequence of your workspace being in RAM. If the power
goes out, or your laptop battery goes dead, your workspace is gone! If you're
worried about this, you can always do a quick save of your workspace as follows.
You can identify your working directory as follows.
Lot's of stuff to remember, right? Don't worry too much about it right now. Working with something is the best way to learn it. If you use it, the knowledge will come! (Okay, apologies to W. P. Kinsella for that one!)
Getting HelpTo see a manual page in R for any function, type...
> help("function.name") # a shortcut is ?function.nameFor example...
> help("mean") # or just ?meanI should point out that these help screens will open in separate windows on the Mac and in Windows. These windows can be manipulated with the mouse just like any other window on your screen. When you're done looking at, click the appropriate button to close it. In Linux, the help screens appear in-line with your R session in the R Console. To get back to your command prompt, press q (lower case Q).
These manual pages are intended for experts and can seem inpenetrable until
you learn a little more about R. Don't worry about them for now. However, if
you're daring and want to see a worked example, try this.
If you don't know the name of the function you want, there is a way around
that. For example, suppose you want to calculate a median but don't know the
function to do so. Try this.
If you are looking for a function that does a "mean-like sort of thing",
and you're not quite sure what it's called, but you're pretty sure that "mean"
is part of its name, do this.
There are also R manuals online (and they also come with the download,
so you have them already on your hard drive). The two most important
ones are "An Introduction to R," and "R Data Import/Export." They can
be found here:
By the way, CRAN stands for Comprehensive R Archive Network. There are many tons of useful stuff online there.
A Final Preliminary Word or Two
Don't expect to understand R all at once! This is a full-featured statistical programming language and analysis environment. You will never understand it all. R will do everything from 2+2 to factor analysis and generalized linear (and nonlinear) models. If you need it done, R will probably do it. I recently had to use a relatively new technique called generalized estimating equations. There weren't at that time many software packages out there that will do it, but R will. I had to download an optional package from the CRAN site, but that's easy enough to do.
If there is something in these tutorials that puzzles you, make a note of it and move on. Ask someone when you have a chance, or wait for it to come up again in a later tutorial. Perhaps it will be explained more fully there. Or try the help (manual) page, but don't pin your hopes on those just yet. Reading those is a skill in itself. It took me quite awhile to get used to them. Try googling it. That usually works for me.
A Final Final Word
Most of the data sets used in the tutorials that follow are either built in to R, i.e., you get them with the download, or are available at this website. See the About The Data Sets document for details.
revised 2016 January 15