# Stat 461 Spring 2016 Homework #2

Stat 461 Spring 2016 Homework #2

Due Friday, January 29

0. Installing R

You may download R for free from the Comprehensive R Archive Network

(CRAN) at http://cran.r-project.org. It is available for Linux, Mac OS X, and

Windows. Follow the directions for installing the base product on the platform

of your choice.

You may also access R from any computer in a Penn State computer lab (R

will be found under ‘All programs’ ? ‘Spreadsheets and statistics’).

1. Sample mean and sample standard deviation for random samples from different

distributions: 30 points

The function rnorm can be used to generate a random sample from a normal

distribution. For example, Run following in R to obtain a sample of size 5, from

a normal distribution with mean of 10 and standard deviation of 2:

rnorm( 5 , 10 , 2 )

(a) We may generate random samples of the same size several times. The R

code for problem 1(a) does this 1000 times, and then for each of the 1000

samples generated, it computes the sample mean and the sample standard

deviation and stores them in the vectors named xbar and s respectively.

Finally, it plots the sample means versus the standard deviations.

xbar=NULL; s=NULL

for ( i i n 1:100 0 ) {

a=rnorm( 5 , 10 , 2 )

xbar = c ( xbar , mean( a ) ) ; s = c ( s , sd( a ) )

}

p lot ( xbar , s )

cor ( xbar , s )

(b) Repeat above except that this time use random samples of size 5 from an

exponential distribution with mean 1.

(c) Repeat one more time, this time using a chi square distribution with degrees

of freedom 3.

(d) For your solution to problem 1, turn in the three plots from (a) – (c). What

do they suggest about the correlation between the sample mean (¯x) and

sample standard deviation (s) for these three distributions?

2. Normal-related distributions (chi-square, t and F): 30 points

(a) Suppose that Z ? N(0, 1), Y ? ?

2

n

and the two are independent. If we

let X = Z/p

Y/n then X is said to have a t-distribution with n degrees

of freedom, X ? tn. The t-distribution is symmetric and bell-shaped. It

resembles the standard normal except that it is wider (greater variance)

and has heavier tails.

The R code below for this problem generates a plot that compares the

pdf’s for t-distributions with different degrees of freedom.

x <? seq ( from=?4.5, to =+4.5, by=.01) # g r i d o f x?v a l u e s

zden s <? dnorm( x )

tden s . 1 <? dt ( x , df=1)

tden s . 2 <? dt ( x , df=2)

tden s . 4 <? dt ( x , df=4)

tden s . 1 0 <? dt ( x , df=10)

# s e t up a bl a n k p l o t t i n g r e g i o n

p lot ( x=c ( ?4. 5 , 4. 5 ) , y=c ( 0 , . 4 ) , type=”n” ,

main=”Comparison of t d e n s i t i e s ” , xlab=”x” , ylab=” d e n si t y ” , )

# The f u n c t i o n l i n e s ( ) adds l i n e s t o an e x i s t i n g p l o t .

# The argumen ts c o l= change t h e l i n e c o l o r s .

# Or you c o ul d use l t y=

# i n s t e a d t o change t h e l i n e t y p e s ( s o l i d , dashed , d o t t e d , e t c . )

l in e s ( x , zdens , co l=1)

l in e s ( x , tden s . 1 , co l=2)

l in e s ( x , tden s . 2 , co l=3)

l in e s ( x , tden s . 4 , co l=4)

l in e s ( x , tden s . 1 0 , co l=5)

# Th is f u n c t i o n adds a l e g e n d t o t h e p l o t .

# The x and y argumen ts are t h e p o s i t i o n o f t h e legen d , wh ich

# you need t o c h o o se by t r i a l and e r r o r .

# A l l t h e p l o t t e d l i n e s are s o l i d ( l t y =1) , b u t t h e y

# have d i f f e r e n t c o l o r s ( c o l = 1 , . . . , 5 )

legend ( x =2.5 , y =.4 ,

l t y = c ( 1 , 1 , 1 , 1 , 1 ) , co l = c ( 1 , 2 , 3 , 4 , 5 ) ,

legend = c ( ” normal ” , ” df = 1” , ” df = 2” , ” df = 4” , ” df = 10” ) )

What do you notice about the t distribution as the degrees of freedom

increases?

(b) If X1 ? ?

2

m, X2 ? ?

2

n

, and the two are independent, then Y =

X1/m

X2/n is said

to have an F-distribution with (m, n) degrees of freedom, and we write

Y ? Fm,n. We will refer to m and n as the numerator and denominator

degrees of freedom, respectively.

Write R code for this problem to create a table comparing the percentiles

of ?

2

1

and F1,n for various values of n.

What do you notice about these values? Can your observation be generalized?

3. The effects of non-normality on confidence intervals about means and variances:

40 points

Suppose y1, . . . , yn is a random sample from a normal distribution with mean µ

and variance ?

2

. Taking ¯y and S

2

to be the usual sample mean and variance,

the exact 95% confidence interval for µ is

y¯ ? t0.975,n?1

S

?

n

, y¯ + t0.975,n?1

S

?

n

(1)

where tp,? denotes the (100p)th percentile of a t-distribution with ? degrees of

freedom. And an exact 95%confidence interval for ?

2

is

(n ? 1)S

2

?

2

0.975,n?1

,

(n ? 1)S

2

?

2

0.025,n?1

(2)

where ?

2

p,? denotes the (100p)th percentile of a ?

2

-distribution with ? degrees

of freedom.

If it turns out that the observed random sample is not from a normal distribution,

then these confidence intervals are no longer exact. This is because they

may not cover their targets (µ and ?

2

respectively), exactly 95% of the time.

We investigate the actual coverage through a simulation.

(a) Simulation 1: Normal Population

In the first simulation, we will draw 10, 000 samples of size n = 319 from

a normal distribution with mean µ = 132 and variance ?

2 = 361. We

compute ¯y and S

2

from each sample, and see how many times out of 10,

000 the confidence intervals given by (1) and (2) cover the true values of

µ = 132 and variance ?

2 = 361. true parameter. The R code below for

this problem will accomplish this.

# De f ine t h e p ar ame te r s o f t h e s im u l a t i o n

mu <? 132

sigma2 <? 361

n <? 319

nrep <? 10000

ybar <? numeric( nrep ) # c r e a t e a v e c t o r t o h ol d t h e ybar ’ s

s2 <? numeric( nrep ) # an o t he r v e c t o r t o h ol d t h e S2 ’ s

co v e r .mu <? l o g i c a l ( nrep ) # l o g i c a l v e c t o r

co v e r . sigma2 <? l o g i c a l ( nrep ) # an o t he r one

for ( i i n 1 : nrep ){

# draw t h e sample

y <? rnorm( n , mean=mu, sd=sqrt ( sigma2 ) )

# compute and s t o r e t h e sample mean and v a r i a n c e

ybar [ i ] <? mean( y )

s2 [ i ] <? var ( y )

s <? sqrt (var ( y ) )

# s e e w he t he r t h e i n t e r v a l f o r mu c o v e r s t h e t r u e v a l u e

co v e r .mu[ i ] <? abs ( ybar [ i ] ? mu ) <=

qt ( . 9 7 5 , 318 ) ? s/sqrt (319 )

# and w he t he r t h e i n t e r v a l f o r s igma2 c o v e r s t h e t r u e v a l u e

co v e r . sigma2 [ i ] <? ( ( n?1)?s2 [ i ] / qchisq ( . 9 7 5 , n?1) <= sigma2 )

& ( sigma2 <= ( n?1)?s2 [ i ] / qchisq ( . 0 2 5 , 3 1 8 ) ) }

tab le ( co v e r .mu )

tab le ( co v e r . sigma2 )

h ist ( ybar )

h ist ( s2 )

Turn in the histograms for ybar and s2. In addition, report the percentage

of the simulated intervals for µ and the simulated intervals for ?

2 which

covered their respective true values. Is this what you expected?

(b) Simulation 2: Lognormal Population

Now run a second simulation to see how the intervals in (1) and (2) perform

when the data are drawn from a lognormal distribution. How much better

or worse did the intervals perform relative to part (a).

(c) Simulation 3: Student’s t Population with 5 degrees of freedom

For a third simulation, see what happens when use a population with heavier

tails. Compare the intervals from this simulation to the ones obtained

in (a) and (b).

**30 %**discount on an order above

**$ 100**

Use the following coupon code:

RESEARCH