STAT300, Homework Assignment No 2

| August 30, 2017

uestion
STAT300, Homework Assignment No 2

(Due: Friday, November 27, 2015, 12noon, in class)

In your solutions, please clearly indicate your name, student ID, and your lab session on the top right corner of each page. Please type your answers, i.e., no hand-writing. Please try to complete it as independently as you can. Discussion of general approaches is allowed, but do not copy answers. This assignment can be completed just using software, not necessarily by hand. Please show the key steps of the solution to each question, not just the final answers.

Problem 1 (35 points). To study the relationship between birth weight and factors/variables such as mother’s age and smoking status, data on 189 subjects were obtained. The dataset is available on class webpage. The variables in the dataset are defined as follows:

low: indicator of birth weight less than 2.5 kg (1=low birth weight, 0=otherwise).
age: mother’s age in years.
lwt: mother’s weight in pounds at last menstrual period.
race: mother’s race (1 = white, 2 = black, 3 = other).
smoke: smoking status during pregnancy (1=smoking, 0=non-smoking).
ptl: number of previous premature labours.
ht: history of hypertension.
ui: presence of uterine irritability.
ftv: number of physician visits during the first trimester.
bwt: birth weight in grams.
Analyze this dataset by answering the following questions:
(a) (5 pts) Perform a hypothesis test to test if there is an association between low birth weight (birth weight less than 2.5 kg) and smoking status.
(b) (5 pts) Perform a hypothesis test to test if there is an association between low birth weight (birth weight less than 2.5 kg) and mother’s race.
(c) (5 pts) Perform a two-sample t-test to test if birth weights (in grams) are the same for smoking and non-smoking mothers.
(d) (10 pts) Perform an ANOVA test to test if birth weights (in grams) are the same for all three races. Perform pairwise tests (multiple comparison).
(e) (10 pts) Conduct a two-way ANOVA test with smoking status and mother’s race as two factors and birth weight as response. (i) Is there a significant interaction between smoking status and mother’s race? Draw a picture to check possible interactions and interpret the results. Explain your answers in simple language understandable by non-statisticians. (ii) Is the conclusions obtained from the ANOVA test the same as those obtained in questions (c) and (d)? Briefly explain the reasons.

1

Problem 2 (25 points). Referring to the dataset in Problem 1.
(a) (7 pts) Find the correlation between mother’s ages and birth weights (in grams). Using the bootstrap method to find the standard error of the correlation.
(b) (7 pts) Find a 95% bootstrap confidence interval for the unknown population correlation. Use bootstrap replications of 100 and 200 respectively and compare the results.
(c) (7 pts) Test if there is a significant correlation between mother’s ages and birth weights (i.e., whether the population correlation is zero or not) using a bootstrap method. Use boot- strap replications of 100 and 200 respectively and compare the results.
(d) (4 pts) Based on results from (b) and (c), are mother’s age and birth weights correlated? Are the results consistent with those in Problem 3?

Problem 3 (40 points). Referring to the dataset in Problem 1. Perform the following further analyses:
(a) (7 pts) Fit a simple linear regression model with birth weight (in grams) as the response and smoking status as a predictor. Is your conclusion consistent with (c) in Problem 1?

(b) (7 pts) Fit a simple linear regression model with birth weight as the response and mother’s race as a predictor. Is your conclusion consistent with (d) in Problem 1?
(c) (10 pts) Fit a multiple regression model with birth weight as the response and mother’s age, mother’s weight at last menstrual period, mother’s smoking status, and mother’s race as predictors. Is your conclusion the same as that in (a) and (b)? Briefly interpret and explain your answers in words.

(d) (10 pts) Do model diagnostics for the model in (c). Do the residual plots seem reasonable? Do the normality assumption seem reasonable? Is it possible to find a better model? If yes, explain how you might find a better model. If no, explain why.
(e) (6 pts) Based on all the results from Problems 1 – 3, what is your overall conclusion for the analysis of this dataset (in simple language understandable by non-statisticians)? Can your conclusion be generalized to all mothers in the world? Why or why not?

Get a 30 % discount on an order above $ 5
Use the following coupon code:
CHRISTMAS
Order your essay today and save 30% with the discount code: CHRISTMASOrder Now
Positive SSL