Stat 461 Spring 2016 Homework #2

| August 31, 2017

Stat 461 Spring 2016 Homework #2
Due Friday, January 29
0. Installing R
You may download R for free from the Comprehensive R Archive Network
(CRAN) at http://cran.r-project.org. It is available for Linux, Mac OS X, and
Windows. Follow the directions for installing the base product on the platform
of your choice.
You may also access R from any computer in a Penn State computer lab (R
will be found under ‘All programs’ ? ‘Spreadsheets and statistics’).
1. Sample mean and sample standard deviation for random samples from different
distributions: 30 points
The function rnorm can be used to generate a random sample from a normal
distribution. For example, Run following in R to obtain a sample of size 5, from
a normal distribution with mean of 10 and standard deviation of 2:
rnorm( 5 , 10 , 2 )
(a) We may generate random samples of the same size several times. The R
code for problem 1(a) does this 1000 times, and then for each of the 1000
samples generated, it computes the sample mean and the sample standard
deviation and stores them in the vectors named xbar and s respectively.
Finally, it plots the sample means versus the standard deviations.
xbar=NULL; s=NULL
for ( i i n 1:100 0 ) {
a=rnorm( 5 , 10 , 2 )
xbar = c ( xbar , mean( a ) ) ; s = c ( s , sd( a ) )
}
p lot ( xbar , s )
cor ( xbar , s )
(b) Repeat above except that this time use random samples of size 5 from an
exponential distribution with mean 1.
(c) Repeat one more time, this time using a chi square distribution with degrees
of freedom 3.
(d) For your solution to problem 1, turn in the three plots from (a) – (c). What
do they suggest about the correlation between the sample mean (¯x) and
sample standard deviation (s) for these three distributions?
2. Normal-related distributions (chi-square, t and F): 30 points
(a) Suppose that Z ? N(0, 1), Y ? ?
2
n
and the two are independent. If we
let X = Z/p
Y/n then X is said to have a t-distribution with n degrees
of freedom, X ? tn. The t-distribution is symmetric and bell-shaped. It
resembles the standard normal except that it is wider (greater variance)
and has heavier tails.
The R code below for this problem generates a plot that compares the
pdf’s for t-distributions with different degrees of freedom.
x <? seq ( from=?4.5, to =+4.5, by=.01) # g r i d o f x?v a l u e s
zden s <? dnorm( x )
tden s . 1 <? dt ( x , df=1)
tden s . 2 <? dt ( x , df=2)
tden s . 4 <? dt ( x , df=4)
tden s . 1 0 <? dt ( x , df=10)
# s e t up a bl a n k p l o t t i n g r e g i o n
p lot ( x=c ( ?4. 5 , 4. 5 ) , y=c ( 0 , . 4 ) , type=”n” ,
main=”Comparison of t d e n s i t i e s ” , xlab=”x” , ylab=” d e n si t y ” , )
# The f u n c t i o n l i n e s ( ) adds l i n e s t o an e x i s t i n g p l o t .
# The argumen ts c o l= change t h e l i n e c o l o r s .
# Or you c o ul d use l t y=
# i n s t e a d t o change t h e l i n e t y p e s ( s o l i d , dashed , d o t t e d , e t c . )
l in e s ( x , zdens , co l=1)
l in e s ( x , tden s . 1 , co l=2)
l in e s ( x , tden s . 2 , co l=3)
l in e s ( x , tden s . 4 , co l=4)
l in e s ( x , tden s . 1 0 , co l=5)
# Th is f u n c t i o n adds a l e g e n d t o t h e p l o t .
# The x and y argumen ts are t h e p o s i t i o n o f t h e legen d , wh ich
# you need t o c h o o se by t r i a l and e r r o r .
# A l l t h e p l o t t e d l i n e s are s o l i d ( l t y =1) , b u t t h e y
# have d i f f e r e n t c o l o r s ( c o l = 1 , . . . , 5 )
legend ( x =2.5 , y =.4 ,
l t y = c ( 1 , 1 , 1 , 1 , 1 ) , co l = c ( 1 , 2 , 3 , 4 , 5 ) ,
legend = c ( ” normal ” , ” df = 1” , ” df = 2” , ” df = 4” , ” df = 10” ) )
What do you notice about the t distribution as the degrees of freedom
increases?
(b) If X1 ? ?
2
m, X2 ? ?
2
n
, and the two are independent, then Y =
X1/m
X2/n is said
to have an F-distribution with (m, n) degrees of freedom, and we write
Y ? Fm,n. We will refer to m and n as the numerator and denominator
degrees of freedom, respectively.
Write R code for this problem to create a table comparing the percentiles
of ?
2
1
and F1,n for various values of n.
What do you notice about these values? Can your observation be generalized?
3. The effects of non-normality on confidence intervals about means and variances:
40 points
Suppose y1, . . . , yn is a random sample from a normal distribution with mean µ
and variance ?
2
. Taking ¯y and S
2
to be the usual sample mean and variance,
the exact 95% confidence interval for µ is

y¯ ? t0.975,n?1
S
?
n
, y¯ + t0.975,n?1
S
?
n

(1)
where tp,? denotes the (100p)th percentile of a t-distribution with ? degrees of
freedom. And an exact 95%confidence interval for ?
2
is

(n ? 1)S
2
?
2
0.975,n?1
,
(n ? 1)S
2
?
2
0.025,n?1

(2)
where ?
2
p,? denotes the (100p)th percentile of a ?
2
-distribution with ? degrees
of freedom.
If it turns out that the observed random sample is not from a normal distribution,
then these confidence intervals are no longer exact. This is because they
may not cover their targets (µ and ?
2
respectively), exactly 95% of the time.
We investigate the actual coverage through a simulation.
(a) Simulation 1: Normal Population
In the first simulation, we will draw 10, 000 samples of size n = 319 from
a normal distribution with mean µ = 132 and variance ?
2 = 361. We
compute ¯y and S
2
from each sample, and see how many times out of 10,
000 the confidence intervals given by (1) and (2) cover the true values of
µ = 132 and variance ?
2 = 361. true parameter. The R code below for
this problem will accomplish this.
# De f ine t h e p ar ame te r s o f t h e s im u l a t i o n
mu <? 132
sigma2 <? 361
n <? 319
nrep <? 10000
ybar <? numeric( nrep ) # c r e a t e a v e c t o r t o h ol d t h e ybar ’ s
s2 <? numeric( nrep ) # an o t he r v e c t o r t o h ol d t h e S2 ’ s
co v e r .mu <? l o g i c a l ( nrep ) # l o g i c a l v e c t o r
co v e r . sigma2 <? l o g i c a l ( nrep ) # an o t he r one
for ( i i n 1 : nrep ){
# draw t h e sample
y <? rnorm( n , mean=mu, sd=sqrt ( sigma2 ) )
# compute and s t o r e t h e sample mean and v a r i a n c e
ybar [ i ] <? mean( y )
s2 [ i ] <? var ( y )
s <? sqrt (var ( y ) )
# s e e w he t he r t h e i n t e r v a l f o r mu c o v e r s t h e t r u e v a l u e
co v e r .mu[ i ] <? abs ( ybar [ i ] ? mu ) <=
qt ( . 9 7 5 , 318 ) ? s/sqrt (319 )
# and w he t he r t h e i n t e r v a l f o r s igma2 c o v e r s t h e t r u e v a l u e
co v e r . sigma2 [ i ] <? ( ( n?1)?s2 [ i ] / qchisq ( . 9 7 5 , n?1) <= sigma2 )
& ( sigma2 <= ( n?1)?s2 [ i ] / qchisq ( . 0 2 5 , 3 1 8 ) ) }
tab le ( co v e r .mu )
tab le ( co v e r . sigma2 )
h ist ( ybar )
h ist ( s2 )
Turn in the histograms for ybar and s2. In addition, report the percentage
of the simulated intervals for µ and the simulated intervals for ?
2 which
covered their respective true values. Is this what you expected?
(b) Simulation 2: Lognormal Population
Now run a second simulation to see how the intervals in (1) and (2) perform
when the data are drawn from a lognormal distribution. How much better
or worse did the intervals perform relative to part (a).
(c) Simulation 3: Student’s t Population with 5 degrees of freedom
For a third simulation, see what happens when use a population with heavier
tails. Compare the intervals from this simulation to the ones obtained
in (a) and (b).

Get a 30 % discount on an order above $ 100
Use the following coupon code:
RESEARCH
Order your essay today and save 30% with the discount code: RESEARCHOrder Now
Positive SSL