![]() PRELIMINARIES A Warning. Take my word for this. Not everything will make sense to you at first! If there is anything that puzzles you, don't worry. You will probably see it illustrated in the tutorials to come. What is R? First, there was S. The S statistical programming language was developed in the late 1970's, primarily by John Chambers at Bell Labs. S was first distributed outside of Bell Labs in 1980, and by 1988 the "New S Language" had become available. This was the basis for the commercial version called S-Plus. By 1990, S or S-Plus was in widespread use by statisticians. In the early 1990's, Ross Ihaka and Robert Gentleman, of the University of Auckland in New Zealand, wrote a teaching version of S/S-Plus and named it R. In 1995 the source code for R was released as open source under the Gnu Public License. People who are unfamiliar with this concept are urged to click those links and read all about it. Since that time, R has become one of the most powerful and versatile statistical software programs available at any price. R is a statistical computing environment. Primarily, it is a programming language, but one containing a very large number of statistical functions. These functions can be used to perform complex statistical analyses interactively, or they can be included in larger scripts and programs to accomplish even more complex tasks. R also has very elaborate graphical capabilities, allowing the production of publication-quality graphics. R is free software. To get R, simply go to the R Project homepage and download it. Click on the CRAN link on the lefthand side of the page under downloads. (CRAN stands for Comprehensive R Archive Network.) R is available for Windows (Windows 95 and later), Mac OS X, and Linux. As of this writing (26 July 2010), the latest version is 2.11.1. Once you have R, you may do almost anything with it you please. You can install it on as many computers as you want. You can examine and modify the source code. You can even resell it, as long as you make the source code available. Unlike commerical software such as SPSS, you are not paying an exorbitant fee to "rent" a restricted-use copy. R is free! The R Prompt. Start R like you would start any other program on your computer. In Windows, the R installer will place a shortcut icon on your desktop. On the Mac, you will have to drag an icon from your Applications folder to the dock. In Linux, R runs from a terminal window. You can also run R from either terminal.app or an X11 terminal on a Mac. I wouldn't recommend this in Windows, however. R is command-line driven. This means you type your commands at a prompt rather than hunting and clicking through menus. This is much faster and more versatile than a GUI (graphical user interface), but it does take some getting used to for those of you who rarely take your hands off your mouse. R GUIs are in development. Google up "R Commander" for example. The S-Plus commercial version also has a GUI, but be ready to pay dearly for it, in cash money, that is. (Note: S-Plus is sold by www.insightful.com, which is now owned by TIBCO and appears to have been rebranded as Spotfire S+.) The R prompt is a greater than symbol, >. When you see it, start
typing. If your command is a long one and breaks onto the next line, or
if you hit Enter without completing a command, R will prompt you on the
next line with a plus sign (+). This means "you ain't done yet. Gimme
more." This often happens when you don't close a parenthesis. For
example...
Note: In the newer versions of R on the Mac, parentheses (and quotes) are closed for you. I.e., as soon as you type the left one, the right one also appears. This can be very convenient; in some cases, it can also be annoying. A common trick that programmers use is to type left and right parentheses at the same time, then backspace and type whatever goes between them. Further note: In some versions of R, especially those on Windows, when your typing reaches the right side of the window, R does not break the line but scrolls the window to the right. Once again, convenient and annoying. You can insert a break yourself by hitting the Enter key, if you prefer the window not to jump around. Quitting R. Best to know this right away, I suppose. To quit R, type...
The parentheses are mandatory. R will ask if you wish to save your workspace. Say yes if you want to save any of the objects you created while working in R. (See the next tutorial for "objects.") Case Sensitive. R is case sensitive. "My_data" and "my_data" are not the same objects. Nor are Anova( ) and anova( ) the same functions (commands). The most common reason I get error messages is capitalizing where I shouldn't have, or not where I should have. If you get an "unknown object" or "function not found" error, check your capitalization first. And then check your spelling! This might also be a good place to point out that R does not like spaces in the names of things. Use a dot or an underline character instead. Thus, "my.data" and "my_data" are fine (and different) names for a data set, but "my data" is not allowed. "MyData" and "myData" are also allowed (and different) names for data objects. But don't use a dash: "my-data". R will think you mean subtraction: "my" minus "data". You should follow these rules not only for named data objects you create while using R, but also in files you save with R. I.e., don't put spaces in filenames. R will work around it if you do, but it's just better to avoid the hassle to begin with. Comment Lines. R code, like any good programming code, can be commented. Any line or
partial line in R beginning with the hash or pound symbol (#) is a
comment, or note, and R will ignore it. You can use these to annotate
your analysis or make notes to yourself (or others).
R Functions. Almost every command you issue in R will take the form of a function.
Functions have the following syntax...
The parentheses are manditory, even if there are no arguments or
options given. For example...
A Taste--Before You Get Too Impatient! If you haven't already, fire up R and get to typing. For educational purposes, R has a large number of data sets built into it. I will use these to illustrate a few things R can do. When you start R, you will get a screen something like this... R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > The R core team and R developers the world over are volunteers doing
this, well, maybe not entirely out of the goodness of their hearts, but
without much compensation. They would like you to cite them if you use
R for data analysis. To see how to do this, issue the following command...
To give you a bit of an idea of what R is capable of graphically, try this...
This little exercise has placed a lot of stuff in your workspace (to be
explained later). We should clean that up. WARNING: The command I am about to
tell you to execute will erase everything in your workspace, and that will be
permanent. If you've been using R already, and there is something in your
workspace you want to save, save it now!
As a brief appetite whetting, let's look at a data set called
"HairEyeColor". This is a table showing a crosstabulation of hair color, eye
color, and sex (gender) for 592 statistics students. Type this, and don't type
the command prompt--R has already supplied that for you. Remember to press
Enter to execute the command line...
The null hypothesis that all factors are independent (i.e., no interactions between any factors) is rejected. If you are getting error messages instead of statistical output, remember: R is case sensitive. HairEyeColor and haireyecolor and NOT the same thing. (Note: The warning message "Chi-squared approximation may be incorrect" means there are expected frequencies less than 5, for those of you who know what that means.) You are not to worry about the details of this syntax at this point. This
is just for show! To collapse over one or more of the factors in this table,
you can do one of these...
To see the data table in a different format (called a flat table), try
this...
Some Definitions. workspace There is one more consequence of your workspace being in RAM. If the power
goes out, or your laptop battery goes dead, your workspace is gone! If you're
worried about this, you can always do a quick save of your workspace as follows
...
working directory You can identify your working directory as follows...
These commands should work in all three major OSes: Windows, OS X, and Linux. Notice when I switched to the new working directory, R brought my old workspace with me. That can be very convenient. It can also be a nuisance. Don't worry about it now. Just create the "Rspace" directory and switch to it. FURTHER NOTE to Windows users: This "should" work in Windows, but it didn't when I just tried it. Windows is a pretty awkward OS when it comes to doing anything useful without a mouse. So if you got an error or warning message, do this. Quit R. In Windows, open your "My Documents" folder (in XP) or your home folder (in Vista or Windows 7). Right click on a blank space inside the folder. Choose "new folder" from the pop-up menu. Name the folder "Rspace". Restart R. Switch to the folder using setwd("Rspace"). You should be ready to go. Use getwd( ) to confirm that things went as intended. From here on, at the beginning of every tutorial, you should begin by executing the command setwd("Rspace"). search path Lot's of stuff to remember, right? Don't worry too much about it right now. Working with something is the best way to learn it. If you use it, the knowledge will come! (Okay, apologies to W. P. Kinsella for that one!) Getting Help. To see a manual page in R for any function, type...
> help("function.name") # a shortcut is ?function.name
For example...
> help("mean")
I should point out that these help screens will open in separate windows on the
Mac and in Windows. These windows can be manipulated with the mouse just like
any other window on your screen. When you're done looking at, click the
appropriate button to close it. In Linux, the help screens appear in-line
with your R session in the R Console. To get back to your command prompt,
press q (lower case Q).
These manual pages are intended for experts and can seem inpenetrable until
you learn a little more about R. Don't worry about them for now. However, if
you're daring and want to see a worked example, try this...
If you don't know the name of the function you want, there is a way around
that. For example, suppose you want to calculate a median but don't know the
function to do so. Try this...
If you are looking for a function that does a "mean-like sort of thing",
and you're not quite sure what it's called, but you're pretty sure that "mean"
is part of its name, do this...
There are also R manuals online (and they also come with the download,
so you have them already on your hard drive). The two most important
ones are "An Introduction to R," and "R Data Import/Export." They can
be found here:
A Final Preliminary Word or Two. Don't expect to understand R all at once! This is a full-featured statistical programming language and analysis environment. You will never understand it all. R will do everything from 2+2 to factor analysis and generalized linear (and nonlinear) models. If you need it done, R will probably do it. I recently had to use a relatively new technique called generalized estimating equations. There aren't many software packages out there that will do it, but R will. I had to download an optional package from the CRAN site, but that's easy enough to do. If there is something in these tutorials that puzzles you, make a note of it and move on. Ask someone when you have a chance, or wait for it to come up again in a later tutorial. Perhaps it will be explained more fully there. Or try the help (manual) page, but don't pin your hopes on those just yet. Reading those is a skill in itself. It took me quite awhile to get used to them. revised 2010 July 26 |