# The American Heart Association

September 29, 2018

1.The American Heart
Association collects data on the risk of strokes. A 10-year study provided data
on how age (X1), blood pressure (X2), and smoking (X3) relate to the risk of
strokes (Y). Assume that the following data are from a portion of this study.
Risk is interpreted as the probability (times 100) that the patient will have a
stroke over the next 10-year period. For the smoking variable (X3), define a
dummy variable with 1 indicating a smoker and 0 indicating a nonsmoker.
(a) Construct a scattergram between Risk (Y) and
the Age (X1). Does there appear to be a linear relationship?
(b) Estimate a simple linear regression between
Y and X1. Briefly evaluate the estimated simple regression equation.
(c) Construct a scattergram between Risk (Y) and
the Blood Pressure (X2). Does there appear to be a linear relationship?
(d) Estimate a simple linear regression between
Y and X2. Briefly evaluate the estimated simple regression equation.
(e) Estimate a multiple linear regression
between Y and X1, X2 & X3. Briefly evaluate the estimated multiple
regression equation.
(f) Provide an interpretation of each estimated
regression coefficient in part e.
(g) Is smoking a significant factor in the risk
of a stroke based on the results of the estimated regression equation in part
e? Briefly explain.
(h) Which estimated regression equation provides
the “Best Fit” to the data? Briefly explain.
(i) Utilize the equation of “Best Fit”
(see part h) to predict the Risk of a Stroke over the next 10 years for a
68-year old smoker who has blood pressure of 175? What action might the
physician recommend for this patient?
2. Continuation of previous problem. The
takeaway from the answers is that the multiple regression with square footage
(x1),number of bedrooms (x2), and number of bathrooms (x3), and distance from
City Center (x4) is a pretty good regression equation with an Rsquare =.384
Adjusted Rsquare = .359, the individual variables are all statistically
significant and the F-Test indicates global statistical significance.
Rsquare of only .384 indicates that the
estimated multiple regression equation is not explaining about 61.6 percent of
the variation of sales price (dependent variable). The search continues for
If the home has a pool may be relevant in
explaining the variation in the sales price of homes. A potential fifth
explanatory variable has been provided (x5) to put together a multiple
regression equation. Since presence of a pool (Yes or No) is a qualitative
variable, a dummy variable has been set where a home with a pool (Yes) = 1 and
pool (no) = 0.
of the relationship between sales price and the pool dummy variable based on
the coding scheme (Pool = 1, No Pool = 0)?
(b) Estimate the multiple regression equation
(include square footage, number of bedrooms, number of bathrooms, distance from
the center of the city, pool dummy variable).
(c) Conduct individual hypothesis tests
(alpha=.05) to determine if any of the explanatory variables can be dropped
from the estimated multiple regression equation.
(d) Conduct a global hypothesis test (F-Test) to
verify the overall statistical significance (p-value =.05) of the estimated
multiple regression equation in part b.
(e) Provide an interpretation of the estimated
coefficient on the pool dummy variable.
(f) If any variables should be dropped from the
multiple regression equation, re-run and repeat parts c and d and e (if
applicable).
(g) Provide an overall written evaluation of