# MHA 610 Introduction to BioStatistics FULL COURSE( ASSIGNMENTS AND DISCUSSION QUESTIONS)

August 30, 2017

Question
WEEK 1

Discussion
To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Hospital Data

Prior to beginning this discussion, please make sure to watch .dropbox.com/s/yy5it543fq6eizw/MHA610_week1_discussion_part1.mp4?dl=0″>Screencast Part 1 and .dropbox.com/s/4bo6vetiazpvikl/MHA610_week1_discussion_part2.mp4?dl=0″>Screencast Part 2. The .next.ecollege.com/pub/content/1ba07de0-0a88-41e0-a748-b57d1f6ce579/MHA610_Week_1_Discussion_hospital_data_022015.xls”>MHA610_Week 1_Discussion_Hospital data (Excel) and .next.ecollege.com/pub/content/7e428655-6471-4ef8-835b-b0bafc2cd59a/MHA610_Week_1_Discussion_hospital_data_022015.csv”>MHA610_Week 1_Discussion_Hospital Data (Statdisk) contains basic demographic information on 250 patients admitted to a community hospital over a two week period. The first row of the worksheet indicates the variable names:

Gender

Male (M) or female (F)

Ethnicity

SevIllnessCode

These are All Patient Refined Diagnosis Related Groups (APR-DRG)
categories of severity of illness, ranging from:

SevIllnessDescr

Mild (Category 1) to extreme (Category 4)

Age

In years

Wt
Patient weight, in kilograms

Ht

Patient height, in centimeters

BMI

Patient body mass index (BMI), where BMI = wt/ht*2, with weight in kilograms
and ht height in meters

APR-DRG

Denotes All Patient Refined Diagnosis Related Group, a widely used inpatient
classification system.

For this discussion, describe and summarize the demographic information on these patients. You may use tables or graphs (or both) for this purpose. Your goal is to convey to the reader an accurate snapshot of these patients. Support your response with correct scholarly sources. You initial post must be at least 250-500 words.

Guided Response:Respond to at least two of your peers by Day 7, 11:59PM. Review your colleague’s summary of the data. Did the method of presentation provide you with any new insights? If so, what are they? If not, what suggestions might you make to your colleague that could improve his or her representation of the data? All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

Assignment
To complete the following assignment, go to this week’sAssignment link in the left navigation.

U.S. Mortality Rates

Examine the burden of disease in the United States to provide important information on which parameter is to base decisions on public health priorities.

To do this, we will utilize mortality data for the United States. In the first part of this assignment, you will download and examine mortality data for your home state.

Go to .worldlifeexpectancy.com/usa-cause-of-death-by-age-and-gender”>USA Causes of Death by Age and Gender

· Choose your home state under the Choose State option (panel on left hand side)

· Select BOTH under the Choose gender option in the middle panel.

· Scroll down to the bottom of the page, and read the fine print to learn for which year the mortality data have been tabulated.

· Copy and paste the relevant mortality data into Excel.

o Drag your mouse over all of the Cause of Death rows, (50 rows), right click, and select Copy,

o Open Excel and paste your selection into Excel. You should have a spreadsheet with 50 row and 19 columns (Columns A-S).

For the first part of the assignment, you will prepare a histogram of the leading causes of death (regardless of age) in your state. Follow the steps below in order to prepare your histogram:

· Sort the Data

o The numbers of deaths, all ages, are given in Column C.

o Select all the columns.

o Then, select Data>Sort>Sort by Column C, Values, largest to smallest. (Make sure that my data has headers is not selected.

o You now have the leading causes of death in your state in Column A (cause) and Column C (frequency).

· If you already know how to draw a histogram in Excel, proceed to do so with Columns A and C, making sure to truncate the data to the 30 leading causes.

· If you do not know how to draw a histogram in Excel, here’s one method:

o Choose the Chart Wizard, chart type column, chart sub-type clustered column (Step 1).

o Click Next for Step 2.

o At Step 2, Click the Series button, which will open a new box.

§ Enter Causes of Death in the Name box;

§ Clear the Values box, then

§ Drag your mouse over the 30 largest frequencies for the Values; and,

§ Drag your mouse over the first 30 causes of death (Column A) for the Category (X) axis labels box.

· Click Next, and you’ll be brought to the Chart Options box.

· Label the Y axis Frequency.

· Click Next, and place the histogram in a new worksheet.

You now have a histogram with the leading causes of death for your state. This presents one picture of the burden of disease in your state, but it isn’t the only picture. We shall now look at a different metric: years of life lost due to each cause.

To do this, we will assume that the average life span is 80 years, and we will calculate how many years of life are lost for each cause of death, according to the age at death.

Please note that the ages are in categories (0 – 14, 15 – 24, 25 – 34, …, 65 – 74, and 75+). For this exercise, we will assume that the average age of death is at the middlepoint of each of these intervals (eg., 7.5, 19.5, 29.5, …, 69.5, and 80 for the last age category respectively). For example, an individual death in the 15-24 (19.5) age group incurs equals 60.5 years of life lost (80-19.5 = 60.5).

To make this histogram, we will compute a new column of values, years of life lost for each cause of death. (This entails writing a simple formula in Excel for the calculation corresponding to the first row of data, then dragging the formula down that column. If you have never done this calculation before in Excel, consult the screencast for detailed instructions.)

· Go back to the original Excel spreadsheet that contained your data.

· Using the formula above, create a column that calculates the years of life lost.

· Now, sort the data by the years of life lost column, in descending order, before drawing a histogram of the results.

· Finally, create a histogram of the 30 leading causes of death, in decreasing order of years of life lost.

o Do not forget to label the y-axis and provide a title for the chart.

You now have two histograms representing the burden of disease in your state. The first histogram orders the causes of death in terms of overall mortality, and the second orders causes of death in terms of years of life lost.

Create a report of your findings that contains both of the histograms. The report should be at least 250-500 words supported by scholarly resources and in APA format. Assume that your task is to assess and prioritize public health needs in your state, and you need to inform and persuade policy makers for improving the well-being of your state’s constituents. Describe which findings are most relevant for this.

You should also explain any methodological or data limitations that exist in either histogram. In particular, describe your conclusions would be altered if you were to refine your findings by reanalyzing mortality rates based on gender and race in addition to age. The assignment should be at least 500 words in APA format supported by scholarly sources.

Carefully review the .waypointoutcomes.com/assessment/4693/preview”>Grading Rubric for the criteria that will be used to evaluate your assignment.

WEEK 2

Discussion

To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Game of Chance

For this discussion, select a game of chance, explain it briefly if it is likely to be unfamiliar to your classmates, then calculate probabilities of various outcomes like winning or losing in this game. For example, you might choose your state lottery, scratch card game, a card game like poker, or a dice game like Craps or Yahtzee, as your game of chance.

Guided Response: Respond to at least two of your classmates who chooses a different game of chance than you by Day 7 at 11:59PM. Did your colleague provide enough explanation of the game to allow you to understand the analysis? Was the analysis provided by your classmate correct? If so, what optimal strategy for playing that particular game was described? If not, what suggestions would you make to your colleague to amend any issues?

Assignment
To complete the following assignment, go to this week’sAssignment link in the left navigation.

Sex Ratio

The normal male to female live birth sex ratio ranges from about 1.03 to 1.07. The sex ratio is defined as the ratio of male births to female births. You might expect boy and girl births to be equally likely, but in fact, baby boys are somewhat more common than baby girls.

Higher sex ratios are thought to reflect prenatal sex selection, especially among cultures where sons are prized more heavily than daughters. We will review sex ratios in the United States as a whole, as well as in individual states, to determine whether sex ratios vary significantly among various ethnic and racial groups.

To do this analysis, we will utilize natality data for the United States, provided by the Centers for Disease Control.

In the first part of the assignment, we will look at sex ratios for your home state, over the time period 1995 to 2002, by race. To obtain this information:

· Go the .cdc.gov/”>CDC Wonder website,

· Click on Births under the WONDER Online Databases to bring you to the Natality Information screen

· On this screen, click Natality for 1995-2002.

· On the following screen, click I Agree in order to agree to abide by the government rules for data use (primarily, concerning confidentiality).

o This will bring us to the Natality, 1995-2002 Request screen.

o In the block 1. Organize table layout, group results by year, followed by race, and then gender.

o In the block 2. Select maternal residence, choose your state.

o You can leave blocks 3 through 6 at their default values (i.e., All).

o Click Send.

· A new screen will open, with data (births) tabulated by Year, Race, and Gender.

· Click Export, click Save, and a text file named Natality, _1995-2002 .txt or something similar will be downloaded onto your computer.

· Load the text file into Excel. This will probably open the Text Import Wizard.

o Accept the defaults, and you should have a spreadsheet with the natality data entered.

· We will need to edit the data slightly before calculating sex ratios and drawing graphs of the sex ratios. To do this:

o Scroll down to the end of the spreadsheet, and delete the rows with the extraneous information about the dataset. (This starts on or about row 203.)

o You may also delete the columns with headings Year CodeRace Code, and Gender Code since we will not be using them, however this is not necessary.

o Next, sort the data, in order to delete some extraneous rows. Select the remaining columns, choose Data > Sort, then sort by Race in ascending order.

o Scroll down to the end of the worksheet, and delete all rows with blanks for Race.

o We will now add a new column to the worksheet for ratios.

§ Go to the first blank column in the worksheet: this column should be immediately to the right of a column labeled Births.

§ In the first row of this column, type Ratios.

o Now, we will calculate different proportions of births, using formulas in excel. It is important to use excel to do the calculation, because it will allow you to quickly complete all of the ratios.

§ First, calculate the ratio of female births to total births for the American Indian race (female births/total births).

§ Next, calculate the ratio of male births to total births for the American Indian race (male births/total births).

§ Finally, calculate the ratio of male births to female births (male births/total births)

§ If you don’t know how to do this calculation easily in Excel, please check out the screencast, which reviews this.

§ Once you have completed the first three cells in the ratio column, you can select them and copy them.

§ Select the remaining cells in the column and paste.

§ You have now completed calculating all of the ratios, however, you may wish to double check to ensure that the formulas have adjusted for each cell.

o Once you have the Ratio column filled out, select that column, then Copy.

o With the column still selected you want to select, click Paste Special and then Values. This will convert the formulas you entered to numbers, so they do not change when you do the next sort.

· Select all the columns, then Data>Sort>Notes in ascending order. We will be graphing the sex ratios for the years 1995 to 2002, by race.

o Feel free to drop the two to four races that have the fewest numbers of births in your state.

· Draw a line chart with markers with the year along the X-axis (we are looking at 1995 through 2002) and sex ratio along the Y-axis (with sex ratios typically between 1 and 1.1, though this may vary in your state).

o If your version of excel has the Chart Wizard:

§ In step two of the Chart Wizard, choose the Series tab; in this window you’ll be adding all the information for the various plots.

§ Under category (X) axis labels, drag your mouse over the cells 1995, 1996… 2002.

§ For values, draw your mouse over the seven successive sex ratios for the particular racial group you chose; in the name box, enter the racial group; do this for each of the groups you want to display.

§ Select Nextwhen you have finished with all the racial groups, and you will be brought to the Chart Options screen.

§ Here, you can customize your graph, with a title and X and Y axis labels (i.e., your state births, year, and sex ratio respectively).

§ Continue with Next, and finish the graph.

o If your version of excel does not have the Chart Wizard, you will need to do some reformatting of your data before you can create a line chart. It is good practice to create a new worksheet in order to preserve your original data.

§ Your data should mimic the way you want your line chart to look. In this case, you want to create horizontal labels for each of the years (1995 through 2002) and vertical labels for each of the races. It should follow this format:

Year 1

Year 2

Year 3

Race A

Ratio for Race A in Year 1

Ratio for Race A in Year 2

Ratio for Race A in Year 3

Race B

Ratio for Race B in Year 1

Ratio for Race B in Year 2

Ratio for Race B in Year 3

§ After you have reformatted your data, select all of the data, then select Insert, then Line, then Line with Markers.

§ You should now have a line chart with each race having its own line, the ratios on the Y-axis, and the years on the X-axis.

§ You may wish to modify the Y-axis by right-clicking on it. Your upper and lower values on the axis should be just above and below your highest and lowest ratio values.

· In a Word document, paste the graph you created (or, alternatively, submit your Excel workbook along with the Word document) and describe your findings, making sure to:

o Summarize the sex ratios for each of the racial groups.

o Explain whether the sex ratios are relatively constant through the 1995 to 2002 period for all of the racial groups or if there are trends?

o Explain any racial groups that have noticeably higher or lower sex ratios than other groups.

o Explain the conclusions you are drawing from your graph.

In the second part of this assignment, you will undertake some formal statistical procedures with the natality data. We will repeat the previous steps, with some slight modifications.

· Click on Births under the WONDER Online Databases to get to the Natality Information screen.

· Select Natality for 2007 – 2012.

· On the next screen, click I Agree in order to agree to abide by the government rules for data use (primarily, concerning confidentiality).

· This will bring us to the Natality, 2007-2012 Requestscreen.

o In block 1. Organize table layout, group results by race and then gender (not year).

o In block 2. Select maternal residence, choose your state.

o You can leave block 3 at its default values (typically, All).

o In block 4. Select birth characteristics; select All Years under Year, and 1st child born alive to mother under Live Birth Order.

o Blocks 5 and 6 can be left at their default values.

· Click Send. A new screen will open, with data (births) tabulated by race and gender.

· Click Export, click Save, and a text file named Natality 2007-2012.txt (or something similar) will be downloaded onto your computer.

We have only four racial groups in this dataset: American Indians or Alaska Natives, Asian or Pacific Islanders, Black or African Americans, and Whites.

Using the normal approximation to the binomial distribution (without continuity correction), calculate z statistics for assessing whether the proportion of boys is .51 in each of the 4 racial groups, where n is the total number of births in a particular cohort, p = .51, q = 1 – p = .49, and x is the number of boy births; z = ((x – np) / sqrt(npq) ).

Under the null hypothesis that the proportion of boys should be 0.51, and under the normal approximation to the binomial distribution, the z statistics should have (approximately) standard normal distributions, (mean 0, standard deviation 1). Do any of the z statistics suggest that the proportion of boy births in any particular racial group differs significantly from .51?

Comment on your findings in your written report. Describe whether you think your results would change if we hadn’t limited consideration to the first-born. Assignment should be at least 250-500 words in APA format supported by scholarly sources.

Carefully review the .waypointoutcomes.com/assessment/7603/preview”>Grading Rubric for the criteria that will be used to evaluate your assignment.

WEEK 3

Discussion
To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Confidence Intervals

In this discussion, we will investigate confidence intervals for binomial probabilities. The discussion is in two parts.
Return to the data you had generated in the second part of the Week Two assignment. You should have total numbers of first-born boys and girls in your state between the years 2007 and 2012 separately by racial group: American Indians or Alaska Natives, Asian or Pacific Islanders, Black or African Americans, and Whites. For the first part of this discussion, construct and report the 95% confidence intervals for the proportions of first-born boys, separately for each racial group. (Use the normal approximation to the binomial distribution.) Comment on the confidence intervals: can you infer from the confidence intervals that the proportions of first-born boys differ among the racial groups? Explain what the widths of the confidence intervals tell you.
Leading up to elections, you often hear results of polls of voters’ preferences, with statements such as: “This poll was taken from a random sample of 600 potential voters, and has an accuracy exceeding 96%.” You may want to interpret the accuracy statement in terms of “margin of error”, as explained in the text, Section 6-2. Remember, the width of a confidence interval is a measure of the precision of the estimate.

Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Consider the 95% confidence intervals your colleague presented. Do all the intervals overlap with those you presented in your initial post? Did the inferences presented by your colleague match with yours? Compare the proportion of boy births in his or her state with those in your state. What statistically significant differences can you note? Do you concur with your colleague’s interpretation of the polling statement? What suggestions might you make to aid your colleague in evaluating this type of polling result? All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

Quiz
To complete the following quiz, go to this week’sQuiz link in the left navigation.

Week Three Quiz

Complete the 10-question quiz on the readings from Weeks One through Three. You may wish to review all of the odd-numbered questions from the text that you have completed in Weeks One, Two, and Three. There is no time limit to this quiz. You will have two attempts to take the quiz. If multiple attempts are made, eCollege will take the last grade earned not the highest grade earned.

Assignment
To complete the following assignment, go to this week’sAssignment link in the left navigation.

Immune Responses

Abnormal immune responses can trigger a range of autoimmune diseases, in which an individual’s immune system is attacking normal tissues in the body. Well-known examples of autoimmune diseases are type 1 diabetes mellitus, lupus, and multiple sclerosis.

Ideally, one would like to harness the immune system to attack abnormal substances or tissues like cancer, while sparing the normal (unaffected) tissue. Many tumor cells produce antigens (proteins) that theoretically ought to trigger an immune response: that is, one’s immune system ought to recognize cancer cells as somehow foreign or abnormal, and thereafter eliminate these cells from the body. The field of cancer immunotherapy is actively pursuing this study.

Tumor antigens may also be useful for diagnostic tests; high levels of tumor antigens could be taken as markers or indicators of cancer. In this assignment, you will be examining levels of tumor-associated antigens (TAAs) as determined from immunoassays (i.e., biochemical tests that measure the concentrations of the tumor-associated antigens in serum samples).

· The spreadsheet contains data on 250 individuals: 90 normal individuals from San Diego (the controls), and 160 individuals from Korea and China, all of whom were diagnosed with hepatocellular carcinoma (HCC).

o Serum samples were taken from the controls and from the cases at time of diagnosis of HCC. Levels of a panel of 12 tumor-associated antigens (TAAs) were assessed via immunoassays in all individuals;

§ The levels are given in the columns with headings Ab14, HCC1, IMP1, KOC, MDM2, NPM1, P16, P53, P90, RaIA, and Survivin. (These are the designations of the 12 TAAs, all of which were thought to be potentially predictive of cancer.)

· The underlying question is whether we can effectively discriminate between the cases and controls on the basis of the levels of these TAAs. This is sometimes termed a classification problem in the statistics and biostatistics literature: we wish to classify individuals as normal or cancer patients on the basis of their TAA levels.

· We will examine these data in Statdisk. Use the .next.ecollege.com/pub/content/8a6a437b-aa08-4c0d-a2f6-6eee062c7306/MHA610_Week_3_Assignment_Data.csv”>MHA610_Week 3_Assignment_Data.CSV file to upload this information into Statdisk.

o If you choose the latter option, Start Statdisk, then choose File>Open and select the .csv file you created (unless you changed the name, it ought to be MHA610_assignment_3_data.csv)

o Check the box that specifies the data contains column titles or headers, select Comma separated for how the data are delimited, click finish, and the dataset will have been successfully imported into Statdisk.

o NOTE: you may want to read through the remainder of the assignment first, before proceeding with this step. This may save you some work afterwards!

· Note that Statdisk operates on columns of data, and that both cases and controls are contained in each column of TAA levels. It will be necessary to separate the cases and controls for further analyses. This can be accomplished either by copying within Statdisk or by reverting to the original Excel workbook, copying in Excel, exporting as a .csv file, and then importing into Statdisk. (Don’t say you weren’t warned!)

· Explain if you would characterize any or all of the TAA levels as approximately normally distributed for the controls and for the cases.

o Provide plots and statistics in support of your conclusions.

· Explain if any of the TAAs are useful for discriminating between the cases and controls.

o Provide plots and statistics in support of your conclusions.

· All writing assignments should be at least 250-500 words in APA format supported by scholarly sources.

BONUS. In the above, we pooled all cases together. Summarize whether you think this is legitimate or whether the levels of any of the TAAs appear to differ significantly between the cases from China and the cases from Korea. Provide evidence in support of your conclusion.

Carefully review the .waypointoutcomes.com/assessment/7601/preview”>Grading Rubric for the criteria that will be used to evaluate your assignment.

WEEK 4

Discussion
To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Exploring t Tests and Confidence Intervals for Continuous Data

In this discussion, we will investigate confidence intervals and t tests for continuous data. To do this, we will revisit the TAA data that you studied in the Week Three assignment.

You may recall from the Week Three assignment that you have available data on 12 tumor-associated antigens (TAAs), from 90 normal individuals (controls) and 160 hepatocellular carcinoma patients (cases). These data are in the Excel file .next.ecollege.com/pub/content/6ea5b6d2-3362-49a9-9daf-97890bb1b60a/MHA610_assignment_3_data.xls”>MHA610_Week 3_Assignment_data.xls; the levels of the 12 TAAs are given in the columns with headings Ab14, HCC1, IMP1, KOC, MDM2, NPM1, P16, P53, P90, RaIA, and Survivin.
First, randomly select three of the 12 TAAs for further study.
Next, Perform two sample t-tests for comparing the levels of each of your three TAAs between the cases and the controls.
Then, Use the t-tests to order the TAAs in terms of relative ability to discriminate between the cases and controls, from best to worst discriminator. Is this ordering helpful if you want to select a subset of TAAs to discriminate between cases and controls? Assume for now that you can judge the relative merits of your three TAAs by the magnitudes of their respective two-sided p-values from the two sample t-tests, so that your best discriminator is the TAA with the smallest p-value.
Lastly, Construct and report 95% confidence intervals for the mean level of your best TAA discriminator in the controls, the mean level of your best TAA discriminator in the cases, and the difference in mean levels (cases – controls). Discuss whether your confidence intervals are concordant with the t-tests.

Guided Response:Respond to at least two of your peers by Day 7, 11:59PM. Do your t tests and ordering coincide with those of your colleague? If not, why? Do you agree with your colleague’s assessment of the usefulness of the ordering to discriminate between cases and controls? Why? Did your best TAA discriminator agree with that of your colleague? If not, why not? Are your confidence intervals identical to those of your colleague? If not, can you determine where a mistake was made? All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

Assignment
To complete the following assignment, go to this week’sAssignment link in the left navigation.

A Crossover Clinical Trial

Background: Randomized controlled trials are the gold standard for clinical research. Biostatisticians are heavily involved in such trials, from the planning stage (e.g., sample size and power considerations) through the analysis of findings (e.g., estimation of treatment effects). In this assignment, we will examine treatment outcomes in a two treatment, two period (two-by-two) crossover design.

In the two-by-two crossover design, subjects are randomly assigned to one of two groups. The first group initially receives treatment A in the first period of the trial followed by treatment B in the second period of the trial, and the other group initially receives treatment B in the first period of the trial followed by treatment A in the second period. The response, or primary endpoint of the trial, is measured at least twice in each patient, at the end of the first period and again at the end of the second period. Each patient is his or her own control for comparison of treatment A and treatment B.

Crossover designs are used when the treatments alleviate a condition, rather than effect a cure. After the response to the treatment administered in the first period is measured, there is a washout period in which any lingering effect of the treatment administered in the first period dissipates, and then the response to the second treatment is measured.

An advantage of a crossover design is increased precision afforded by comparison of both treatments on the same subject, compared to a parallel group clinical trial (in which patients are randomized onto different treatment arms). Disadvantages of crossover trials are complex statistical analyses of findings (typically, by complex analyses of variance), potential difficulties in separating the treatment effects from the time effect (patients may respond differently in the first period and the second period), and the carryover effect (the effect of the treatment given in the first period may not totally wash out, but may carry over onto the second period).

We will give a simple example of a two-by-two crossover trial, and undertake analyses of the trial results via t tests. The trial was meant to assess the efficacy of a new experimental therapy for interstitial cystitis (IC). Interstitial cystitis is a chronic bladder condition affecting primarily women; symptoms include bladder pressure and pain, urgency, and occasionally pelvic pain. The new experimental therapy was meant to reduce pain and urgency relative to standard therapy. A total of 24 patients were enrolled in the trial; trial results are given in the Excel workbook titled .next.ecollege.com/pub/content/a3523ba2-0b12-475c-be55-3dfe7c150fff/MHA610_Week_4_Assignment_Crossover_Trial_Data.xls”>MHA610_Week 4_Assignment_Crossover_Trial_Data.xls.

Open the workbook, and examine the worksheet. The first row contains column headings, and the next 24 rows represent the 24 patients entered into the trial. The group one patients received experimental therapy in the first period of the trial followed by standard therapy in the second period of the trial. The group two patients received standard therapy in the first period of the trial followed by experimental therapy in the second period.

The primary outcome of the trial was an area under the curve (AUC) calculation of relative pain and urgency the patient experienced following therapy: the smaller the AUC, the less severe the patient’s pain and urgency. AUC_period1 denotes each patient’s AUC during the first period of the trial, and AUC_period2 denotes the patient’s AUC during the second period of the trial. The column headed Rx denotes the treatment each patient received during the first period of the trial.

· We will first test for carryover effects.

o The t test formulation for the test for carryover proceeds as follows: calculate the total (sum) of the AUC_period1 and AUC_period2 values for each patient in group one (12 patients) and separately for each patient in group two (12 patients).

o The test for carryover is the two sample t test for assessing whether these AUC totals differ significantly between group one and group two under the assumption that the variances of the AUC totals in the two groups are identical.

o Calculate the sample means and standard deviations for the AUC totals for each group, and perform the two sample t tests. Analyze whether there is a significant carryover effect in this clinical trial.

· We will next test for treatment effects.

o The t test formulation for assessing treatment effects proceeds as follows:

§ Calculate the difference of the AUC values for each patient in group one, that is, the 12 individual AUC_period1 – AUC_period2 values, and similarly calculate for each patient in group two.

§ If there is no treatment effect, one would expect the AUC_period1 and AUC period 2 values to be similar, except perhaps for an offset due to period effects; we need to account for potential period effects when we compare the group one and group two AUC differences.

§ It turns out that the t test for a treatment effect is the two sample ttest for assessing whether these AUC_period1 – AUC_period2 differences differ significantly between group one and group two, under the assumption that the variances of the AUC differences are the same in the two groups.

o Calculate the sample means and standard deviations for the AUC differences as defined above in each group, and perform the two sample ttest. Analyze whether there a significant treatment effect in this clinical trial.

Here’s an informal explanation of this t test. Consider the following schematic representation of the two-by-two crossover trial.

Group

Period One

Period Two

1. AB Sequence

Treatment A + Period One

Treatment B + Period Two

2. BA Sequence

Treatment B + Period One

Treatment A + Period Two

In this representation, Treatment A is the direct effect of treatment A on each patient’s response (AUC value) and similarly for Treatment B; Period One is the effect of period one on each patient’s response and similarly for Period Two. (We are assuming there are no carryover effects.)

Now, consider first the individuals in group one. During Period One, their responses, (i.e., AUC_period1 values), are estimating effects due to treatment A and period one. During Period Two, their responses (i.e., AUC_period2 values) are estimating effects due to treatment B and period two. So when we take the average of the group one AUC_period1 – AUC_period2 values, (let’s call this average x?), we have a combined estimate of the effects (Treatment A – Treatment B) + (Period 1 – Period 2).

Next, consider the individuals in group two. When we take the average of the group two AUC_period1 – AUC_period2 values (let’s call this average y), we have a combined estimate of the effects (Treatment B – Treatment A) + (Period 1 – Period 2).

Lastly, consider the random variable Z = x? – y. This random variable estimates solely the quantity (Treatment A – Treatment B); the period effects (Period 1 – Period 2) cancel out. Under the null hypothesis of no treatment effects, (Treatment A – Treatment B) = 0, so the mean of Z should be zero. The two sample t test for treatment effects outlined above is equivalent to the t test of whether the mean of Z equals zero. Note that since we have equal numbers of patients in group one and group two, there was no need to take sample means when we constructed our t test; but in general, with unequal sample sizes, you should work with sample means when performing the t tests.

Briefly summarize your findings from this trial. Explain whether the new treatment appears promising in a 500 words in APA format supported by scholarly sources.

BONUS. Graphical representations of the findings can be quite illuminating. As a bonus, you are asked to prepare graphical representation(s) of the data. For example, you might prepare a simple plot of mean responses (mean AUC values) for each treatment arm and for each period. Or, you could give patient profile plots of individual AUC values by period and treatment. Describe whether histograms, boxplots, or scatter plots would work with these data. If you assume that there are no significant carryovers or period effects in this trial, explain how you would display the treatment effects in a 250 words in APA format supported by scholarly sources.

Carefully review the .waypointoutcomes.com/assessment/7604/preview”>Grading Rubric for the criteria that will be used to evaluate your assignment.

WEEK 5

Discussion
To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Graphs

It is important to look at data in a graphical form. Patterns are the essence of data exploration, and the eye’s ability to discern forms and patterns makes visual display integral to the process. The visual display of quantitative information can help us see connections and relationships in the data, which are oftentimes difficult to detect in tables of numbers. We should look at data in a graphical form, and not rely solely on computational or statistical metrics.

In this discussion, we will explore graphs in linear regression. Our data are taken from an article by Frank Anscombe in a 1973 article in The American Statistician, which discusses scatterplots in relation to regression analyses.

First, download the dataset .next.ecollege.com/pub/content/32b85ec5-8957-4b78-918a-f5adb26e104e/MHA610_Week_5_Discussion_regression_data.xls”>MHA610_Week 5_Discussion_Regression_Data.xls. This is a simple Excel workbook, with data on one sheet. There are eight columns of data, with headings X1, Y1, X2, Y2, X3, Y3, X4, Y4. Import the data into Statdisk using the .next.ecollege.com/pub/content/d8d8cfe5-e44d-4069-bf7d-9063e78127dd/MHA610_Week_5_Discussion_regression_data.csv”>MHA610_Week 5_Discussion_Regression_Data.CSV file, and perform the following analyses.
Calculate the regressions of Y1 on X1, Y2 on X2, Y3 on X3, and Y4 on X4, and compare the results (summary statistics). Explain what, if anything, you find unusual about these results.
Plot each set of data, along with the fitted regression line. Describe what the graphs tell you about the relationships between the X’s and the Y’s.
Explain what lessons you draw from this exercise.

Place the summary statistics and the plots in a separate Word document and attach that document to your initial post. Address the questions in the body of your initial discussion post.

Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Do your summary statistics and plots agree with those of your colleague? If not, how and why do they differ? Did your colleague’s conclusions broaden your perspective on linear regression? All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

Assignment
To complete the following assignment, go to this week’sAssignment link in the left navigation.

Brain Size and Intelligence

Background: Is brain size a measure of intelligence? Brain size tends to vary with body size: for example, sperm whales and elephants have brains up to five times as massive as human brains. So across species, brain size is not a perfect measure of intelligence. And within species, the underlying organization (complexity of connections) and molecular activity of the brain are likely to be more directly associated with intelligence than mere size.

In this assignment, we will investigate relationships between physiological measures of the brain, and intelligence. Download and open the Excel workbook, .next.ecollege.com/pub/content/fb80707f-24ae-4d42-88b6-5c3aef8aec89/MHA610_Week_5_Assignment_Brain_Data.xls”>MHA610_Week 5_Assignment_Brain_Data.xls. The workbook contains data on 20 youths, in rows two through 21. Eight variables (the columns) were recorded on each individual; the column headings are given in row one. The column headings are as follows:

IQ

the individual’s IQ

Order

the birth order (1 = firstborn, 2 = not firstborn)

Pair

marker for genotype

Sex

gender, 1 = male, 2 = female

CCSA

corpus callosum surface area (in cm2)

HC

TOTSA

total brain surface area (in cm2)

TOTVOL

total brain volume (in cm3)

WEIGHT

body weight (in kg)

The neuroanatomical measures CCSA, TOTSA, and TOTVOL were determined from magnetic resonance imaging (MRI) of the brains, followed by automated image analyses of the scans. The corpus callosum is a bundle of neural fibers beneath the cortex, connecting the left and right cerebral hemispheres of the brain; it is the communication highway between the two hemispheres. (The more lanes to the highway, the faster the traffic ought to flow.)

The following questions can be answered in Excel, StatDisk, or other statistics software you may have available.

· Examine all of the pairwise correlations among the physiological measures CCSA, HC, TOTSA, TOTVOL, and WEIGHT. Which two variables have the strongest correlation? Report the correlation, and plot the scattergram for these two variables. Also, report the correlation and plot the scattergram for the two variables that have the weakest correlation.

· Determine whether the physiological parameters CCSA, HC, TOTSA, TOTVOL, and WEIGHT are significant predictors of IQ. That is, run a sequence of univariate regressions, with IQ as the dependent variable, and the physiological parameters as the independent variables. Report the best univariate regression with statistics and a graph of the regression. Describe whether IQ can be accurately predicted from any of these brain measures individually or in combination.

BONUS. Power law distributions, that is, functional relationships between two variables in which one variable is roughly a power of the other, are often used to model physiological data. One of the oldest power laws, the square-cube law, was introduced by Galileo in the 1600’s: empirically, the square-cube law states that as a shape grows in size, its volume grows faster than its surface area. We shall investigate the square-cube law with two variables from our dataset, CCSA and TOTVOL. If CCSA varies with some power of TOTVOL, for example, CCSA = k * (TOTVOL) ? (k is an unknown constant here), then a simple way of estimating the exponent ? is via linear regression: take log(CCSA) as the dependent variable and log(TOTVOL) as the independent variable; the fitted regression coefficient (slope) is an estimate of the exponent. (Do you see why this is true?) Perform this linear regression, and report your results. Describe whether the regression coefficient is significantly different from 2/3. (The 2/3rd power law occurs often in nature.)

Carefully review the .waypointoutcomes.com/assessment/7605/preview”>Grading Rubric for the criteria that will be used to evaluate your assignment.

WEEK 6

Discussion
To participate in the following discussion, go to this week’sDiscussion link in the left navigation.

Health and Nutritional Status

Since 1971, the National Center for Health Statistics had been assessing the health and nutritional status of both children and adults in the United States, through periodic National Health and Nutritional Examination Survey (NHANES) surveys. These surveys are an invaluable resource to epidemiological and public health research; the surveys can be used to determine the prevalence of major diseases and risk factors, to assess nutrition and health promotion, and to guide public health policy.
All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

In 2012, the NHANES National Youth Fitness Survey (NNYFS) was conducted in conjunction with NHANES to obtain physical activity and fitness levels of U.S. youths aged 3 through 15. Initial data from the NNYFS were released in 2013 and serve as the basis for this discussion problem.

Begin by downloading the Excel file .next.ecollege.com/pub/content/91986259-7c7f-4992-8169-35627c9db7fb/MHA610_Week_6_Discussion_NNYFS_workingdata.xls”>MHA610_Week 6_Discussion_NNYFS_workingdata.xls. This workbook was created by merging two datasets from the NNYFS: .cdc.gov/nchs/nhanes/search/nnyfsdata.aspx?Component=Demographics”>the demographic variables dataset, and .cdc.gov/nchs/nhanes/search/nnyfsdata.aspx?Component=Examination”>the body measures dataset. For the purposes of this discussion, many variables were eliminated from the original datasets, as well as observations with missing data on height and weight. The Excel workbook thus consists of one worksheet, with 1576 rows (the first row contains headers, and the next 1575 rows are observed values for the participants), and 11 columns of variables. The columns in the Excel file are the following:
SEQN

the respondent sequence number (index for all the files)

RIAGENDR

gender of the participant, 1 = male, 2 = female

RIDRETH1

race/Hispanic origin:

1 = Mexican American
2 = other Hispanic
3 = non-Hispanic white
4 = non-Hispanic black
5 = other

RIDEXAGY

age in years at time of physical exam

INDHHIN2

annual household income, categorized

INDFMIN2

annual family income, categorized

INDFMPIR

ratio of family income to poverty, 0 to 5

BMXWT

weight, in kg

BMXHT

height, in cm

BMXBMI

body mass index (kg/m^2)

BMDBMIC

BMI category:

1 = underweight
2 = normal weight
3 = overweight
4 = obese
. = missing

More detailed descriptions of these variables are given at the data documentation web pages for the NNYFS, at .cdc.gov/nchs/nnyfs/Y_DEMO.htm”>http://www.cdc.gov/nchs/nnyfs/Y_DEMO.htm and at.cdc.gov/nchs/nnyfs/Y_BMX.htm”>http://www.cdc.gov/nchs/nnyfs/Y_BMX.htm.

For purposes of this discussion, you are asked to answer the three following questions:
Does BMI vary significantly between boys and girls?
Does BMI vary significantly among the racial/ethnic groups?
Is there any trend to BMI with age?

There are several ways to address these questions. For example, you might take BMXBMI as your outcome variable of interest: it is continuous, so you could then perform a two-sample t test for (1), a one way analysis of variance for (2), and a simple regression analysis (with age as the predictor variable) for (3).

Alternatively, you might reduce the problem to consideration of binomial probabilities: for example, you could classify everyone as obese or not obese (or maybe, overweight/obese vs underweight/normal), then compare binomial outcomes for (1) and (2) (z tests with the normal approximation or contingency tables), and conduct a t test on ages for (3).

Neither approach is wrong—the key is interpreting your findings!

If you prefer to do the analyses in Statdisk, there is a file, NNYFS_workingdata.csv, ready to be read into Statdisk. (It’s the original Excel workbook, saved as csv.) No need to go through any additional steps, unless you wish to restructure the data in Excel.

Incidentally, the income variables are not needed for these questions, but as a bonus, you might want to investigate whether obesity is related to socioeconomic status (as reflected by family income).

Guided Response: Respond to at least two of your peers who chose a different of analysis that you by Day 7, 11:59PM. Did you arrive at the same conclusions as your colleague even though you chose different methods? If so, which method do you think is preferable and why? If not, which method do you believe produces more credible results and why? (You might consult the text to support your argument.). All initial and peer postings should be at least 250-500 words in APA format supported by scholarly sources.

Quiz
To complete the following quiz, go to this week’sQuiz link in the left navigation.

Week Six Quiz

Complete this quiz on the readings from Weeks Four through Six. It may be helpful to review the odd numbered questions from your text that you completed in Weeks Four, Five, and Six. There is no time limit to this quiz. You will have two attempts to take the quiz. If multiple attempts are made, eCollege will take the last grade earned not the highest grade earned.

Final Project
To complete the following final project, go to this week’sFinal Project link in the left navigation.

Final Project

In this final assignment, we will revisit datasets that we have utilized in previous assignments, but with new objectives.

· In the Week One assignment, you looked at mortality in your particular state, with two different metrics: the first was numbers of deaths, and the second was years of life lost. For this question, return to the original dataset, but this time first pool all cancer causes of death together, so that cancer constitutes the only category for cause of death. Then, repeat your analyses from Week One. How do your conclusions change?

· In the Week Two assignment, you looked at sex ratios for births in your state.

o Take the data you have assembled from the second part of your Week Two assignment, namely, numbers of first-born boy and girl births in your state between 2007 and 2012, separately by racial group (i.e., American Indians, Asians, Blacks, and Whites). Form a two-by-four contingency table from these data: the two row categories are female (girl) and male (boy), and the four column categories are the four racial groups. Calculate the chi-square statistic from this contingency table, and interpret the result.

o Return to the .cdc.gov/”>CDC Wonder website, and obtain the numbers of births in your state between 2007 and 2012, by month. (Disregard gender, or race, or birth order—you want all births). Calculate a chi-square statistic to assess whether there is any seasonality to births. (Your null hypothesis is that births should be equally likely to occur in any of the 12 months. We are ignoring the varying lengths of the months to simplify calculations.) How would you interpret your findings? Explain in 500 words in APA format supported by scholarly sources.

BONUS: Give a graphical representation of your findings for this portion highlighting what you consider significant.

· In the Week Three assignment, you were given levels of tumor-associated antigens in a sample of 90 normal (non-cancer) individuals, and 160 hepatocellular carcinoma (HCC) patients. Here is a proposed diagnostic test for HCC:

o For each individual, calculate a numerical score:

§ score = -3.95 + 10.7 * HCC1 – 4.14 * P16 + 13.95 * P53 + 28.92 * P90 + 6.48 * survivin

§ (This equation was derived from logistic regression.)

o If this score is positive (i.e., > 0), diagnose this individual as an HCC patient; if this score is negative (i.e., Grading Rubric for the criteria that will be used to evaluate your assignment.

Get a 30 % discount on an order above \$ 5
Use the following coupon code:
CHRISTMAS
Positive SSL