PSYC 480 -- Dr. King

Regression Quiz

You will not be submitting this. It is a self-quiz by which you can test your knowledge. If you don't ace this, you should not go on. You will need to know all of this to understand what's coming.

Here is an exercise to hone your regression skills. You're going to have to do the R for yourself. I'm not going to show you the R output, but I will give you the commands. I'll leave the command prompts off so you can copy and paste.

After clearing your workspace...

rm(list=ls())   # you can also use a menu to do this

... do this to get the data.

data(mtcars)   # built-in dataset
help(mtcars)   # will probably open a new window

The data were extracted from the 1974 Motor Trend US magazine, and comprises
fuel consumption and 10 aspects of automobile design and performance for 32
automobiles (1973–74 models).

A data frame with 32 observations on 11 (numeric) variables.

mpg     Miles/(US) gallon
cyl     Number of cylinders (4, 6, or 8)
disp    Displacement (cu.in.)
hp      Gross horsepower
drat    Rear axle ratio
wt      Weight of the car (in 1000 lbs)
qsec    1/4 mile time
vs      Engine (0 = V-shaped, 1 = straight, i.e., V engine vs. in-line engine)
am      Transmission (0 = automatic, 1 = manual)
gear    Number of forward gears (3, 4, or 5)
carb    Number of carburetors (1, 2, 3, 4, 6, or 8)

You can close the Help window that has opened once you're done staring at it. Get a correlation matrix. To make it easier to read, we'll round off the correlations to two decimal places.

round(cor(mtcars), 2)

Questions 1-5. What are the following correlations?
1) mpg with disp:
2) disp with hp:
3) hp with wgt:
4) gear with cyl:
5) gear with hp:

We are interested in the correlations that are in the first column of the correlation matrix, which are correlations of the other variables with mpg, or gas mileage of the car in miles per gallon of gasoline.

6) Higher values of which of the following are associated with increased gas mileage in this sample? (Be careful answering this question. Make sure you understand how these variables are coded.)




7) Which of the following variables has the strongest relationship to gas mileage?




8) Is that relationship (question 7) linear?




with(mtcars, scatter.smooth(mpg ~ wt))

9) Based on the scatterplot, which of the following is a correct description of the relationship between mpg and wt?




A side note: The curve in the scatterplot is said to be "decelerating" because its slope decreases as it goes from left to right across the graph. If the slope were increasing from left to right, the curve would be called "accelerating." (For the mathematicians in the audience, I should say in both those cases the absolute value of the slope. If the curve becomes flatter from left to right, it's decelerating, and if it becomes steeper from left to right, it's accelerating.) Such curves can often be flattened out with a log transform. Try with(mtcars,scatter.smooth(mpg~log(wt))) and see what happens. We will now continue our analysis without the log transform.

We are going to continue with a linear regression analysis, even though we know we don't have a linear relationship (because that's what we know how to do).

lm.out = lm(mpg ~ wt, data=mtcars)
summary(lm.out)

10) The correct regression equation is:




11) Use the regression equation to make a prediction for a car weighting 4000 pounds. (Be careful that you understand how weight is coded.)




12) For each additional 1000 pounds of car weight, how does equation predict gas mileage will change?




13) What percentage of the variability in gas mileage is accounted for by the weight of the car in this model?




plot(mpg ~ wt, data=mtcars)
abline(lm.out)

14) The line we have just plotted on the scatterplot is the:




par(mfrow=c(2,2))
plot(lm.out, 1:4)

15) The Residuals vs. Fitted plot indicates that there is a serious problem with:




16) Were there any cases that had undue influence in this regression analysis? If so, identify the car.




Okay, now lets see how mpg is related to horsepower (hp). The R is up to you. See if you can answer the following questions.

17) The relationship between mpg and hp is (positive / negative).

18) The relationship between mpg and hp is (linear / nonlinear accelerating / nonlinear decelerating).

19) The relationship between mpg and hp is (weak / strong).

20) The linear correlation between mpg and hp is .

21) The least squares regression equation relating mpg to hp is .

22) A car with 500 hp would be predicted to have a gas mileage of mpg.

23) Looking at the scatterplot again, could you make a more reasonable prediction? mpg.

24) From the regression equation, we would predict that for every additional 1 hp produced by the engine the gas mileage should decrease by mpg.

25) The percentage of variability explained in mpg by its linear regression relationship to hp is %.

26) Here's one we didn't do above. See if you can get it. The residuals from the linear regression relationship range from -5.712 to 8.236 mpg. A typical residual has a magnitude of mpg (ignoring the sign).

27) The residuals vs. fitted plot shows that the relationship is .

28) The Cook's distance plot shows that, in this analysis, one car has a Cook's D of about 1.0. That car is the .

Now we'll do it again, except this time we'll look at the relationship between mpg and engine size (disp).

29) The relationship between mpg and disp is (positive / negative).

30) The relationship between mpg and disp is (linear / nonlinear accelerating / nonlinear decelerating).

31) The relationship between mpg and disp is (weak / strong).

32) The linear correlation between mpg and disp is .

33) The least squares regression equation relating mpg to disp is .

34) A car with 500 cubic inch displacement would be predicted to have a gas mileage of mpg.

35) Looking at the scatterplot again, could you make a more reasonable prediction? mpg.

36) From the regression equation, we would predict that for every additional 1 cubic inch of engine displacement the gas mileage should decrease by mpg.

37) The percentage of variability explained in mpg by its linear regression relationship to disp is %.

Follow-up question) Have you noticed, in the case of simple linear regression (i.e., one predictor), the value of multiple R-squared is just the correlation squared? .

38) The residuals from the linear regression relationship range from -4.892 to 7.231 mpg. A typical residual has a magnitude of mpg (ignoring the sign).

39) The residuals vs. fitted plot shows that the relationship is .

40) The Cook's distance plot shows that, in this analysis, the (which car) has the largest Cook's D, but it is less than 0.5, so we're not going to worry about it.

41) The Scale-Location plot shows that we have a clear violation of (which assumption).

42) The Normal Q-Q plot shows that the residuals are , which is a violation of the normality assumption.

Two more things you should remember. First, you can get confidence intervals around the regression coefficients by doing this.

confint(lm.out)

Second (and this is something I told you a long time ago), you can do a t-test by regression. Compare the output of these commands.

t.test(mpg ~ am, data=mtcars, var.eq=T)

lm.out = lm(mpg ~ am, data=mtcars)
summary(lm.out)
confint(lm.out)

How did you do? Are you ready for multiple regression?