# Statistic and Probability Question Assignment 2016

August 31, 2017

Question

General Information

· In this assignment, you will analyse data relating to educational attainment of a random sample of Australian Year 12 students.

· You are required to use the most appropriate technique or method to evaluate or present data. DO NOT use every technique you can think of as this shows that you do not understand what is required. Use the techniques that you think are most appropriate. Remember that different techniques/tests/graphs may provide you different types of information. Use your judgement carefully.

· Your explanations must be clear, concise and complete.

· It is recommended that you use a word processor and Excel (or alternative software as appropriate) to produce your assignment. A handwritten assignment will be accepted only if it is legible and easy to follow. However, all tables/graphs/estimation results must be printed (not hand written/drawn).

· All tables and graphs should be accurately referenced and referred to in the text. All graphs should be fully labelled.

· Hypothesis tests should be fully explained (e.g. see structure outlined in lectures). It is NOT acceptable to simply state a result.

· Ensure that you analyse data thoroughly and present results carefully. Do not simply submit printouts of results from Excel, or other software packages. Make sure that you interpret results in the context of the initial problem in order to show your understanding. You can also make recommendations about further research that should be conducted in order to provide a better answer (see Question 9).

· Relevant printouts of results from XL, or other software should be contained in appendices to the assignment. This applies most pertinently to regression output.

· Students may use appropriate analytical software than XL. However, students wishing to use other software should discuss with the lecturer prior to starting the assignment. If students use R, S, Matlab or other software for which coding is required, copies of the code used should also be attached to the assignment as appendices.

· Please note that teaching staff are not responsible for resolving personal difficulties you may have when working with peers.

· Wherever possible, students should work in groups of two or three to complete the assignment. You will need to complete the group assignment cover sheet (available on MyLO) and attach it to your assignment. All students in a group must sign this cover sheet. It is NOT acceptable for students to sign for other group members. On-campus students need to submit a hard copy of the assignment to the assignment box labelled BEA674 in TBSE before the due date. Distance students should submit their assignment to the DropBox folder provided on MyLO.

· As far as practical, all students working in groups should be involved in all parts of the assignment.

The Data

The Education.xlsx dataset has been collected from students who completed year 12 in Australia in 2014.

Variables contained in the dataset are as follows:

Age – age in years of student at completion of the 2014 school year

Gender – gender of student

Type school – Three different types: exclusive private (cost > \$10,000/yr), second tier private (\$3,000 – 10,000/yr), public (government run school)

Absence – number of days student absent from school during year 12

State – state of Australia in which student lives

Location – classifications are remote (nearest settlement of >400 people > 200km away), rural (agricultural area, nearest settlement of >400 people 400,000 people)

Nat_status – 3 categories: Aus/NZ, Aus Indigeneous and Other

University – Does the student intend to go to university the year following Year 12 completion (ie. 2015)?

Meduc – number of years of education the student’s mother has

Feduc – number of years of education the student’s father has

Single par – Does the student come from a single-parent family?

Income – Family income

Yr10_score – Final year 10 score (%)

Yr 12_score – Final year 12 score (%)

Assignment Questions

QUESTION 1

a. What is the dependent variable we are interested in?

[2 marks]

b.Describe what kind of variable the variable identified in part a. is.

[2 marks]

c. Select the three (3) categorical variables that you think will best help to understand variation in the variable identified in part a. Ensure that at least one of these variables has 3 or more categories. Explain why you have chosen these variables.

[12 marks]

d.Select the three (3) numerical variables that you think will best help to understand variation in the variable identified in part a. Explain why you have chosen these variables.

[12 marks]

e. Explain what issues, if any, exist with the observations in the dataset. If any issues were identified, explain how you dealt with them (eg. excluded the observation/used an average/median etc.)

[8 marks]

[Total marks: 36]

QUESTION 2

a. Construct a frequency distribution of the variable identified in Question 1a. Be careful not to use too many or too few classes.

[8 marks]

b.Plot the information in your frequency distribution in a fully-labelled histogram.

[8 marks]

c. Use appropriate numerical descriptive measures to further summarise/describe the variable identified in Question 1a.

[6 marks]

[Total marks: 22]

QUESTION 3

a. Use appropriate tables or graphs to help describe the three categorical variables you have chosen in Question 1c. Use only one table or graph for each variable.

[15 marks]

b.Use appropriate tables or graphs to help describe the three (3) numerical variables you have chosen in Question 1d. Use only one table or graph for each variable.

[15 marks]

[Total marks: 30]

QUESTION 4

Use the variable for which you presented a frequency distribution (Question 2a) and use the same classes you used for the frequency distribution. Select one other categorical variable from the three selected in Question 1c that has least three categories.

a. Construct a contingency table using these two variables.

[8 marks]

b.Identify all joint and marginal probabilities.

[4 marks]

c. Comment on the joint and marginal probabilities shown in your table.

[8 marks]

d.Calculate the conditional probabilities for one column/row (whichever is longer) of your contingency table.

[5 marks]

[Total marks: 25]

QUESTION 5

From 1980 – 1990, the average year 12 score for a student intending to go to University was 80.

a. Using the data provided for this assignment, test whether the average score of students planning to go to university in 2015 has:

i. changed OR

ii. increased OR

iii. decreased

(just select ONE of these options to test).

Explain why you have chosen to test i, ii or iii.

When conducting the test, be sure to clearly show your working. Usea = 0.05.

[8 marks]

b.Comment on your result above. Do you think it has any implications for universities? Society? Explain.

[6 marks]

c. Construct a 95% confidence interval for the mean population year 12 score for students intending to go to university in 2015.

[6 marks]

d.Construct a 95% confidence interval for the mean population year 12 score for students not intending to go to university in 2015.

[6 marks]

e. Compare your results for c. and d. What do they suggest about mean scores of students intending to go to university in 2015 and the mean scores of students not intending to go to university in 2015?

[4 marks]

f. If (prior to collecting data) we considered that an acceptable sampling error for each of our 95% confidence intervals constructed in parts c and d was 5, what sample size would we require? (UsesNot_going_uni2015 =30, ands Going_uni2015= 20).

[6 marks]

[Total marks: 36]

QUESTION 6

a. Use one of categorical variables selected in Question 1c that has at least 3 categories and conduct a one-way ANOVA on the variable identified in Question 1a.

[15 marks]

[5 marks]

c. Do you think there are any problems with your results? Explain.

[4 marks]

[Total marks: 24]

QUESTION 7

Using the contingency table you constructed in question 4a, test the hypothesis that the two variables are independent of one another. Usea = 0.05. Be sure to state your conclusions and discuss what they mean.

[15 marks]

[Total marks: 15]

QUESTION 8

a. Conduct a regression analysis on the variable identified in Question 1a and at least one of the variables identified in Question 1d.

[10 marks]

b.Plot the data and line of best fit. Explain what the regression line means.

[8 marks]

c. Discuss whether there is a significant relationship between the dependent variable and the independent variable(s) and what this means. Usea = 0.05.

[8 marks]

d.Explain what R2 (simple linear regression), or R2 adjusted (multiple linear regression) tells you about the relationship between the dependent and independent variable(s).

[4 marks]

e. Explain whether it was appropriate to use linear regression on the variables you have selected. Show evidence to support your case.

[10 marks]

[Total marks: 40]

QUESTION 9

[10 marks]

b.On the basis of your results, explain whether your choice of variables (Questions 1c & 1d) have been useful in explaining variation in the variable identified in Question 1a.

[5 marks]

c. Based on the results you obtained, and your response to Question 9b above, what advice would you give to a policy maker interested in improving Year 12 scores?

[6 marks]

d.Do you think there were additional variables in the dataset (ie. variables you did not select for analysis) that would have helped you to better understand variability in the variable identified in Question 1a? Explain.

[5 marks]

e. Do you think there are any additional variables for which information should have been collected? Explain your reasoning.

[6 marks]

[Total marks: 32]

