R Functions and Procedures Summarized -------------------------------------------------------------------------------- Important Note: R is case sensitive! For an example of any of the following functions: > example(function_name) ...and if R says "Waiting to confirm page change..." press Enter. -------------------------------------------------------------------------------- simple arithmetic performed at the command line (a, b may be numbers or vectors) a + b means add a and b a - b means subtract a and b a * b means multiply a and b a / b means divide a by b (a + b)/b means grouping works as you might expect; this equals a/b + 1 -------------------------------------------------------------------------------- abline(a=intercept, b=slope) - draw a line on an existing graph abline(h=value) - draw a horizontal line on an existing graph abline(v=value) - draw a vertical line on an existing graph abline(lm_results) - plot the regression line on an existing scatterplot addmargins(contingency_table_name) - print contingency table with marginal sums anova(lm_results) - see the anova on stored results of a linear regression anova(lm_results1, lm_results2) - test two stored lm models against each other aov(formula) - ANOVA attach(dataframe_name) - CAREFULLY(!!!) attach a dataframe to search path barplot(table_name) - plot a bar graph of values in table (must be tabled first) barplot(2D_table_name, beside=T, legend=T) - side-by-side bar graph bartlett.test() - test for homogeneity of variance binom.test(number_of_successes, N, p=null prob.) - exact binomial test boxplot(variable_name) - plot a boxplot of a numerical variable boxplot(DV ~ IV) - plot boxplots at each level of an independent variable c(data_values) - concatenate; group values into a variable or vector chisq.test(vector_name, p=vector_null_probs.) - chi square goodness of fit test chisq.test(table_name) - chi square significance test of indpendence chisq.test_result$expected - examine expected frequencies chisq.test_result$residuals - examine residuals class(object_name) - find out what kind of object R considers this to be colnames(tablelike_object) - create or get column names; also names() confidence intervals - to get a CI, do the corresponding significance test cor(variable_name1, variable_name2) - Pearson's r between numerical variables cor(var_name1, var_name2, use="pairwise.complete") - if there are missing values cor(var_name1, var_name2, method="spearman") - Spearman correlation cor(dataframe_name) - get a correlation matrix of all numerical variables cor.test(var_name1, var_name2) - significance test for r sign. diff. from zero cor.test(var_name1, var_name2, alternative="more" or "less") - one-sided tests curve(function_of_x, from=value, to=value) - plot an equation data.frame(vector1, vector2, vector3, ...) - create a dataframe from vectors dataframe_name[i,] - get row i of the dataframe dataframe_name[i:j,] - get rows i through j of the dataframe dataframe_name[,k] - get column k of the dataframe; or add column to dataframe dataframe_name$variable_name - get column of dataframe with given variable name dbinom() - binomial probability function detach(dataframe_name) - remove dataframe from search path dim(object_name) - get the dimensions of a table, matrix, or dataframe dir() - view files in the working directory exp(value) - raise e to the power of value; natural antilog fisher.test(contingency_table_name) - Fisher's exact test fivenum(variable_name) - five number summary of a numerical variable function() - program your own functions into R; see sem getwd() - find out what your working directory is hist(variable_name) - plot a histogram of a numerical variable hist(var_name, breaks=vector_of_desired breaks) - with specified break points interaction.plot() - draw a graph of an interaction between two variables IQR(variable_name) - get the interquartile range using quantile() type=7 kruskal.test() - nonparametric alternative to oneway ANOVA legend() - add a legend to an existing plot; complicated length(variable_name) - get the no. of cases in a variable (n); chokes on NAs levels(variable_name) - get the levels of a categorical variable library(library_name) - load an optional library, for example MASS lines(specification_of_line) - draw a line on an existing graph lm(DV ~ IV) - linear regression of DV on IV using least squares lm(DV ~ IV + I(IV^2) + I(IV^3) + ...) - polynomial regression lm(DV ~ IV1 + IV2 + IV3 + ...) - multiple linear regression lm_model$residuals - examine the residuals from a stored lm model load("location_of_previously_saved_file") - load a "saved" file; see save() log(value) - take the natural log of a number; log(value, base)=log to any base log(variable_name) - do a log transform on a numerical variable (base-e) log10(variable_name) - a base-10 log transform ls() - list the objects in the workspace; also objects(); character(0)=nothing mad(variable_name) - get the median absolute deviation from the median margin.table(contingency_table_name, 1) - get the row marginal sums margin.table(contingency_table_name, 2) - get the column marginal sums matrix(vector_name, nrow=value, ncol=value) - create a matrix from a vector max(variable_name) - get maximum value in a numerical variable mean(variable_name) - get mean of a numerical variable; chokes on NAs mean(variable_name, na.rm=T) - mean of variable with missing values median(variable_name) - get median of the numerical variable; na.rm is available min(variable_name) - get the minimum value in a numerical variable NA - a missing value (R's missing value code) names() - can be used to name variables or to retrieve the names of variables oneway.test(formula) - one-way betw. subjects ANOVA pbinom() - binomial cumulative probability function pie(table_name) - pot a piechart of values in table (must be tabled first) plot(predictor_var, response_var) - scatterplot of two numerical variables plot(var1, var2, log="y") - plot with a logarithmic y-axis; fails if any y<1 plot(dataframe_name) - plot a matrix of scatterplots of numerical variables plot(density(variable_name)) - kernel density smoother for numerical variable plot(lm_results) - plot four regression diagnostic plots from stored lm results plot.new() - open the graphics device for plotting pnorm(z) - get the area under the unit normal curve that is below z points(vector_of_x-values, vector_of_y-values) - plot points on a graph prop.table(table_name) - convert tabled frequencies to proportions of N prop.table(table_name, 1) - convert tabled freqs. to proportions by rows prop.table(table_name, 2) - convert tabled freqs. to proportions by columns prop.test(number_of_hits, N, p=null_prob.) - one-proportion test prop.test(vector_of_hits, vector_of_Ns) - multiple-proportion test qqline(variable_name) - draw the normal line on an existing qqnorm() plot qqnorm(variable_name) - plot a normal probability plot of a numerical variable qqplot(variable1, variable2) - compare distributions of two variables quantile(variable_name, probs=vector_of_probs.) - get desired quantiles quantile(variable_name, probs=c(1/4, 3/4), type=2) - get textbook Q1 and Q3 range(variable_name) - reports min() and max() rbind(vector1, vector2, vector3, ...) - bind vectors into a table by rows read.csv("file_location") - read in a csv formated data file read.table("file_location", header=T) - read in tabled data with headers rm(object_name) - remove an object from the workspace; also remove() rbinom() - generates random binomially distributed values rep(data_vector, freq_vector) - creates a vector with repeating elements rnorm() - generates random normally distributed values rownames(tablelike_object) - create or get row names; see colnames() save(R_object, file="filename") - save a data object to a file in working dir. scan() - accept data input directly from the keyboard scatter.smooth(predictor_var, response_var) - scatterplot with lowess smoother sd(variable_name) - get the sample standard deviation of a numerical variable search() - as R where it is looking for stuff (the search path) sem=function(x) {sd(x)/sqrt(length(x))} - quick and dirty sem function seq(from=value, to=value, by=value) - create a regular sequence of numbers setwd() - set or change the working dircetory (better to use the menus) sort(variable_name) - sort values in ascending order sort(variable_name, decreasing=T) - sort values in descending order source("location_of_script") - read in and execute a text file of R commands sqrt(number or vector_name) - take the square root of number or numbers stem(variable_name) - get a stem and leaf display of a numerical variable step() - automated forward, backward, or stepwise regression; not a good idea! str(object_name) - find out the structure of an R object subset() - subset a dataframe sum(vector_name) - sum the values in a vector; na.rm is available summary(object_name) - summarize an R object; summary stats from a variable summary(lm_results) - get a detailed summary of stored linear regression results table(variable_name) - create a 1D frequency table of a variable table(var_name1, var_name2) - create a contingency table with var_name1 in rows tapply(DV, IV, function) - calculation a function of a DV at each level of an IV t.test(variable_name, mu=null_mean) - single-sample t test t.test(var_name, mu=null_mean, alternative="more" or "less") - one-sided test t.test(variable1, variable2) - two-sample t test with Welch correction t.test(var1, var2, var.equal=T) - no Welch correction (pooled variance t test) t.test(var1, var2, alternative="more" or "less") - one-sided test (gp1 - gp2) t.test(DV ~ IV) - two-sample t test using formula interface (from a dataframe) t.test(... , paired=T) - related samples t test text(x-coord, y-coord, "string of text") - add text to an existing plot title(main="text string", xlab="text string", ylab="text string") - add titles and axis labels to an existing plot update(lm_model, ~.-variable_name) - update an lm model by deleting a variable var(variable_name) - get the sample variance of a numerical variable var.test(vector1, vector2) - test for a significant difference betw. variances vector_name[i] - get the value at position i of the vector vector_name[i:j] - get values at positions i through j of the vector which(variable_name .test. value) - find out which values pass the test; .test. may be < (less than), > (greater than), == (equals), != (not equal) wilcox.test(DV ~ IV) - commonly called the Mann-Whitney U test wilcox.test(DV ~ IV, paired=T) - Wilcoxin test for paired scores with(dataframe-name, function) - specify dataframe to be used with a function xtabs(formula, data=dataframe_name) - crosstabulation; see handout for formula z.test - doesn't come with the default packages; can be added in or programmed --------------------------------------------------------------------------------