# ECON7300: Statistics for Business and Economics Statistical Project Assignment 2, Semester 2, 2015

Question

Instructions for Dataset 3: Simple Regression Analysis (30 marks)

A statistician collected data on 390 father-son pairs to examine Galton’s law.

The variables in the dataset are:

Fathers (fathers’ heights in centimetres)

Sons (sons’ heights in centimetres)

The dependent variable for your analysis is Sons.

Answer the following questions using dataset 3.

a) Estimate a regression model using fathers’ heights to predict sons’ heights (state the simple linear regression equation).

b) Interpret the meaning of the slope.

c) Predict sons’ heights when fathers’ heights = 170 cm.

d) Compute the coefficient of determination and interpret its meaning.

e) Compute the standard error of the estimate and interpret its meaning. Judge the magnitude of the standard error of the estimate.

f) Perform a residual analysis (plot the residuals) and evaluate whether the assumptions of regression have been violated.

g) Test for the slope using t test (follow all the necessary steps). Assume 5% level of significance.

h) Test for the slope using F test (follow all the necessary steps). Assume 5% level of significance.

i) Test for the correlation coefficient (follow all the necessary steps). Assume 5% level of significance.

j) Compute a 95% confidence interval estimate of the mean sons’ heights in the population when fathers’ heights = 170 cm and interpret its meaning.

k) Compute a 95% prediction interval of sons’ heights for an individual with fathers’ heights = 170 cm and interpret its meaning.

Instructions for Dataset 4: Multiple Regression Analysis (45 marks)

The dataset is an extract of the US National Longitudinal Survey for employed women in 1988

The variables in the dataset are:

wage (hourly wages in dollar)

hours (number of hours worked)

grade (current grade completed by the employee)

south (1 if the employee lives in south and 0 otherwise)

The dependent variable for your analysis is wage.

Answer the following questions using dataset 4

a) Estimate a regression model using hours and grade to predict wage (state the multiple regression equation).

b) Interpret the meaning of the slopes.

c) Predict the hourly wage when hours = 30 and grade = 15.

d) Compute a 95% confidence interval estimate of the mean hourly wage in the population when hours = 30 and grade = 15 and interpret its meaning.

e) Compute a 95% prediction interval of the hourly wage for an employee with hours = 30 and grade = 15 and interpret its meaning.

f) Plot the residuals to test the assumptions of the regression model. Is there any evidence of violation of the regression assumptions? Explain.

g) Determine the variance inflation factor (VIF) for each independent variable (hours and grade) in the model. Is there reason to suspect the existence of collinearity?

h) At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model (use t tests and follow all the necessary steps). On the basis of these results, indicate the independent variables to include in the model.

i) Test for the significance of the overall multiple regression model at 5% level of significance.

j) Determine whether there is a significant relationship between hourly wage and each independent variable at the 5% level of significance (hint: testing portions of the multiple regression model using the partial F test).

k) Compute the coefficients of partial determination and interpret their meaning.

l) Estimate a regression model using hours, grade and south to predict wage (state the multiple regression equation, the regression equation for employees living in south, the regression equation for employees not living in south) and interpret the coefficient for south.

m) Estimate a regression model using hours, grade, south, an interaction between hours and grade, an interaction between hours and south, and an interaction between grade and south to predict wage.

n) Test whether the three interactions significantly improve the regression model. Assume 5% level of significance (hint: test the joint significance of the three interaction terms using the partial F test. If you reject the null hypothesis, test the contribution of each interaction separately (using the partial F test) in order to determine which intera

**30 %**discount on an order above

**$ 5**

Use the following coupon code:

CHRISTMAS