# STA215-Fall 2015: ASSIGNMENT # 3 CASE STUDY: The Maternal Health Study

Question

CASE STUDY: The Maternal Health Study

Consider the Maternal Health Study Example from lectures. Here is the background

information on the full study.

Warning on cigarette packages: “Smoking by pregnant women may result in fetal injury,

premature birth, and low birth weight.”

The Child Health and Development Studies (Yerushalmy, 1964;1971) examined all pregnancies

that occurred between 1960 and 1967 among women in the Kaiser Foundation

Health Plan in San Francisco, California. In order to investigate the cigarette package

claim, information was collected on the 1236 mother and babies that participated in the

study. The data set has the following columns:

• ID: the identification number of the mother/baby

• Gestation: length of gestation (measured in days)

• Birthweight: the baby’s birth weight (measured in ounces)

• Age: the mother’s age at the time she gave birth to the baby

STA215-Fall 2015: ASSIGNMENT # 3 Page 2 of 5

• Education: the mother’s highest education level (Less than Grade 8, Some Highschool,

Highscool Graduate, Trade School, Some College, College Graduate)

• SmokePreg: whether the mother smoked during pregnancy (Yes, No).

The full data is attached.

For the purposes of this assignment, pretend that the 1236 mothers/babies in the data set

are the target population.

Below is some useful code and hints:

# This will code the success (smoking=Yes) as 1 and failure (smoking=No) as 0.

> SmokePreg_yourlastname SmokePreg_yourlastname[which(SmokePreg==”yes”)] = 1

> SmokePreg_yourlastname[which(SmokePreg==”no”)] = 0

# A random sample without replacement of size n from a population of size N

> sample(1:N, n, replace=F)

# Once you read in the data from csv file and attach the data

# This creates a matrix and stores 1000 different random samples of size n

# of your variable – each row is a sample

> samplesofsizen lastname_variablename_samplemeansofsizen sapply(samplesofsizen, functionname)

# Calculate the standard deviation of each sample and store it

> lastname_variablename_samplesdofsizen<- sapply(samplesofsizen,sd)

STA215-Fall 2015: ASSIGNMENT # 3 Page 3 of 5

Assignment Questions:

First, we are interested in estimating the proportion of mothers who smoke during pregnancy.

Before you start, do some exploratory data analysis to understand the data –

appropriately deal with missing values, clean the data, etc. Also, make a new Bernoulli

variable called ‘SmokePreg yourlastname’ coding smoking=Yes as the success.

1. What is the variable of interest? Is it categorical or quantitative? State the population

size. Present relevant graphs/plots and describe the distribution of this variable.

Calculate the population proportion and standard deviation, report these values, and

then interpret them in plain English.

2. Generate 1000 random samples of size n = 10 from the appropriate variable. For each

sample, calculate the sample proportion of interest and store these sample proportions

in a vector appropriately labelled with your last name, for example,“lastname variablename

samplemeansofsize10 ”. Make relevant graph(s) of these sample proportions.

Comment on the distribution of the sample means – shape, mean, and variability.

3. Generate 1000 random samples of size n = 100 from the appropriate variable. For each

sample, calculate the sample proportion of interest and store these sample proportions

in a vector appropriately labelled with your last name, for example,“lastname variablename

samplemeansofsize100 ”. Make relevant graph(s) of these sample proportions.

Comment on the distribution of the sample means – shape, mean, and variability.

4. What do the graphs from the Questions 2 & 3 above represent? Summarize the results

from the above simulations. Comment specifically on the mean, variability, and shape

of the distribution. Which theorem have you exhibited through these simulations?

Now, choose one of the quantitative variables that does not look Normally distributed.

Rename this variable with your last name appropriately, for example, “lastname variablename”.

As usual, do exploratory data analysis before answering the questions.

5. What is the variable of interest? Present plots to show that it is not Normally distributed

and make relevant commentary. State the population size. Present relevant

graphs/plots and describe the distribution of this variable. Calculate the population

mean and standard deviation, report these values, and then interpret them in plain

English.

6. Generate 1000 random samples of size n = 10 from this variable. For each sample,

calculate the sample mean and store these sample means in a vector appropriately

labelled with your last name, for example,“lastname variablename samplemeansofsize10

”. Make relevant graph(s) of these sample means. Comment on the distribution

of the sample means – shape, mean, and variability.

7. Generate 1000 random samples of size n = 100 from this variable. For each sample,

calculate the sample mean and store these sample means in a vector appropriately

labelled with your last name, for example,“lastname variablename samplemeansofsize100

”. Make relevant graph(s) of these sample means. Comment on the distribution

of the sample means – shape, mean, and variability.

STA215-Fall 2015: ASSIGNMENT # 3 Page 4 of 5

8. What do the graphs from Questions 7 & 8 above represent? Summarize the results

from the above simulations. Comment specifically on the mean, variability, and shape

of the distribution. Which theorem have you exhibited through these simulations?

9. You have simulated a total of 4 distributions of sample means (for two different variables

using two different sample sizes of n = 10 and n = 100). For each case, should the

means of the distributions of sample means equal the value of the parameter? Explain

why or why not using correct statistical terminology.

**30 %**discount on an order above

**$ 100**

Use the following coupon code:

RESEARCH