AHCP 5309 WK 11 P2 Assignment 2015

| August 30, 2017

1. The dataset, acath.csv, contains real-world data regarding cardiac catheterization procedures. If you are unfamiliar with this procedure, it is used to diagnose and treat heart problems. A long, thin tube is inserted into an artery in your groin, neck, or arm and threaded to your heart. Variables in the dataset are defined below.




Gender, {0=female, 1=male}


Age, {0, 1, 2,..}


Duration of Symptoms of Coronary Artery Disease in months


Cholesterol in mg/dl


Significant Coronary Disease by Cardiac Cath coded {0=not significant, 1=significant}


Three Vessel or Left Main Disease by Cardiac Cath, {0=absent, 1=present}

a. State the levels of measurement for all variables. Justify.

> data1 data1

gender age duration chol sigdz tvdlm

1 0 17 1 120 0 0

2 0 18 0 NA 0 0

3 0 19 1 150 0 0

4 0 20 7 170 0 0

5 1 20 11 178 0 0

6 1 22 4 188 0 0

7 0 22 7 NA 0 0

8 0 23 6 251 0 0

9 0 24 31 190 0 0

10 0 24 41 157 0 0

11 1 25 4 188 1 1

12 1 25 6 310 1 0

13 0 25 2 202 0 0

14 1 25 2 153 0 0

15 0 26 2 230 0 0

16 1 26 3 186 0 0

17 0 27 1 NA 0 0

18 0 27 6 192 0 0

19 0 27 8 230 0 0

20 1 27 1 172 0 0 ….

> names(data1)

[1] “gender” “age” “duration” “chol” “sigdz” “tvdlm”

> levels(“gender”)


> levels(“age”)


> levels(“duration”)


> levels(“chol”)


> levels (“sigdz”)


> levels (“tvdlm”)


Nominal measurements- data in the form of names, categories, or labels

Interval Level of Measurement: temperature in degrees

Quantitative measurements: data in numerical form

Qualitative measurements: categorical data that doesn’t include numbers


Statistics Solutions. 2015..statisticssolutions.com/data-levels-of-measurement/”>https://www.statisticssolutions.com/data-levels-of-measurement/

Lane, D. M., & Osherson, D. (2013). Levels of Measurement. In D. M. Lane,Introduction to Statistics (pp. 34-39). Houston.

b. Provide descriptive statistics as well as appropriate graphs for both age and tvdlm. Interpret. Provide side-by-side boxplots for tvdlm and age. Interpret.

c. Test the hypotheses that gender and the presence of significant disease are independent. Use alpha =.01. Provide a 99% confidence interval. What are the implications of your results?

d. Build a 99% confidence interval for age. Test whether the mean age is 51.84 years using alpha =.01.

e. Test whether log(duration+.5) is a function of age. Provide separate boxplots for duration as well as log(duration+.5). (Do not do side-by-side.) Explain why the log was used and + .5 was added. Plot the relationship. Fit a line for the relationship. Provide and interpret the ANOVA table. Interpret the R2 and standard error. Forecast log(duration+.5) for age = 80, and convert back to the untransformed variable, age. Forecast log(duration+.5) for age = 10.

2. This problem requires the use of the Challenger dataset.If you recall, the variables in the data set are defined as follows:

launch: this numbers the temperature-sorted observations from 1 to 23.

temp: temperature in degrees Fahrenheit at the time of launch

incident: If there was an incident with an O-Ring, then it is coded “Yes.”

o_ring_probs: counts the number of O-ring partial failures experienced on the flight.

a. In the already temperature-sorted dataset, find on which observation the first successful launch occurred (one with no incident).________Test the hypothesis that the first failure would comeon or after this observation. Use alpha = .10.

b. How many failures occurredabove 65 degrees F? ­­­­­­­_____ Test the hypothesis that you would seethis many or fewer failures given afixed population of 23 launches.

c. What was the overall failure probability?_____How many failures occurredbelow 65 degrees F?_____ Assume that the launches represent asequential sample (independent events). Test the hypothesis that you would see this many failures or more.

d. How many o-ring problems (not incidents, o_ring_probs) occurred in the 23 launches?_____How many of these failures occurred in the first four temperature sorted launches?_____ Test the hypothesis of obtaining that amount or a more extreme amount of failures if the failure rate was constant.

e. Provide a 90% confidence interval for incidents.

3. The accumrates.csv data set contains information on various funds as discussed in class. The TSP is a retirement contribution plan for federal civilians and uniformed military personnel of the United States, similar to a traditional 401K plan. Participants are offered five different funds including the Government Security Fund (or G Fund), the Fixed Income Index Investment Fund (or F Fund), the Common Stock Index Fund (or C Fund), the Small Capitalization Stock Index Fund (or S Fund), and the International Stock Index Investment Fund (or I Fund). The G Fund is invested in short-term government securities issued by the Treasury and investment risk is near zero. The F Fund objective is to match the Barclay’s Capital U.S. Aggregate Index, which provides a broad representation of the U.S. bond market. The C Fund objective is to track the Standard and Poor’s 500 (S&P 500), which provides representation for medium and large capitalization stocks. The S Fund objective is to match the Dow Jones U.S. Completion Total Stock Market Index. Finally, the I Fund attempts to match the performance of the Morgan Stanley Capital International EAFE (Europe, Australasia, Far East) Index (Thrift Savings Plan, 2014).

a. Provide appropriate descriptive statistics / graphs for the CFund accumulation rates. If you invested $100,000 in the CFund at the beginning of January 2003, what would it have been worth at the end of December, 2014? What would this same $100,000 be worth if you invested it all in the SFund? The IFund?

b. The Coefficient of Variation (CV) is a measure of risk. It is defined as the (standard deviation / mean) x 100 %. The higher the coefficient of variation, the higher the risk associated with an investment. Calculate the CVs for all funds. Which fund is the riskiest? Which fund is least risky? Does this make sense based on the investment strategies?

c. Test the hypothesis that the CFund accumulation rates are related to the month. Use alpha = .10

d. Test the hypothesis that the CFund accumulation rates are related to the year. Be sure to make year a factor (i.e., factor(Year)), so that you can see any non-constant effects Use alpha = .10

e. Plot the relationship between the CFund and the SFund. Next, plot the relationship between the CFund at time t and the SFund at time t-1. What is the implication? Run a regression of both relationships (CFund vs. SFund, and CFund(t) vs. SFund(t-1). Interpret the results. Are you surprised?

f. Provide a 99% confidence interval for the GFund. Test whether this is truly a no risk fund (e.g., accumulation rates greater than 1) at that level.

Get a 30 % discount on an order above $ 5
Use the following coupon code:
Order your essay today and save 30% with the discount code: CHRISTMASOrder Now
Positive SSL