------------------------------------------------------------------------- R For Slackers (i.e., people who weren't in class to see the demo of R) ------------------------------------------------------------------------- Real data analysis is done using software, and an increasingly popular software package is called R. There are advantages to R: 1) it is very powerful (many professional staticians use it for their research), 2) it's not hard to use, 3) it's FREE! Another advantage -- this is the software that is now being used by at least some of the professors who teach PSYC 226 and PSYC 480. Most powerful statistical analysis software, such as SPSS (Statistical Package for the Social Sciences), is very expensive (in the neighborhood of $1500 just for the base package). R is available for free download from the Internet. To get a copy, go to www.r-project.org (there is a link to this at my website) click on CRAN over at the lefthand side of the page, pick a mirror site near you (UCLA seems to be fast), and download the base package. Installation is easy in Windows -- just click on the downloaded installer. R has also been installed on all the lab computers in CSCC 209. Go to the Start menu to start it. If you want to install R on a laptop and bring that to class, e.g., to do the quizzes, that's fine with me. When R starts, you will see the R Console where you will type commands. The prompt looks like a "greater than" sign... > ...followed by (on most systems) a block cursor. You do need to know how to use a keyboard to use this program (as I more than aptly demonstrated in class). There are pulldown menus at the top of the Console, but they don't do much but allow you to save files and so forth. The actual statistical commands must be typed at the prompt. Here are some of the things I demonstrated in class. 1) To put your data into a "list" (technically in R-lingo called a "vector")... a) decide what you want to name your variable (let's called it "scores") b) then do this... > scores=c(118,112,114,120,115,114,115) c) and then press the Enter key (note: these are the scores I used in lab to demonstrate mean and median) d) as with your calculator, getting the right answer from R depends upon entering the data correctly, so it's worth taking a minute to check 2) To plot a histogram of "scores" do this... > hist(scores) Do not type the >, by the way. That's the command prompt and will already be there. This command will open a graphics window and plot the histogram there. There are all kinds of options you can specify to "pretty up" the graphics output. For example, try this... > hist(scores, col="gray") "Col" stands for "color." If you don't like gray, try red or blue or yellow. When the graphics window is the focused window, you can save your graphic by pulling down the File menu and picking Save as, then choose the format you wish to save it in. (Pick jpeg if you don't know what any of these mean.) 3) To see a frequency table of "scores" do this... > table(scores) 4) To see the scores themselves just type the name of the variable... > scores And remember to press the Enter key of course. 5) To calculate the mean and the median of "scores" do this... > mean(scores) > median(scores) 6) To quit... > quit() The parentheses must be included. R will then ask you if you want to save your workspace. Click the "No" button and R will terminate. If you have R on your own computer and you want to save what you've done, click "Yes." Your defined variables ("vectors") will be saved to a default file and will be available for you to work with the next time you start R. ------------------------------------------------------------------------- Here is a summary of some of the R functions I will demonstrate in class. quit() - gets you out!; also q() ls() - list the crap you've put in your workspace remove() - remove unwanted crap from your workspace; also rm() c() - concatenate; stick numbers (or words) together into a data vector length() - gives n, the number of values in the data vector table() - produces a frequency distribution table min() - get the minimum value max() - get the maximum value range() - get the min and max values in one go median() - get the median mean() - get the mean summary() - get the min, Q1, median, mean, Q3, and max IQR() - get the interquartile range var() - get the sample variance sd() - get the sample standard deviation