# M3 Group Assignment Instructions

M3

Group Assignment Instructions

Complete

the Assignment, name it as GroupXX _Assign3.xls (where XX is your Group Name),

and upload and submit to the instructor through Dropbox. Do not enter

anything in the spreadsheet cells that are black, labeled “Grader”.

You

must complete this assignment without the assistance of persons other than the

members of your Group. You may use any other resources you deem necessary.

Answer the questions below by placing the appropriate graph and/or answers in

the designated cells of the spreadsheet.

DO NOT CHANGE

THE APPEARANCE OR FUNCTIONALITY OF THE

SPREADSHEET

UNLESS INSTRUCTED TO DO SO.

.jpg”>

Question 1 (20 points)

A mental health agency

measured the self-esteem score for randomly selected individuals with

disabilities who were involved in some work activity within the past year. The

spreadsheet named Self Esteem provides the data including each

individuals self-esteem measure (y), years of education (YrsEdu), age, months

worked in the last 6 years (MonWork), marital status dummy variables (MS2, MS3,

MS4) indicating if the individual is single, married, separated, or divorced,

and a support level (SL) dummy variable indicating if the level of job support

(counseling, etc) was provided directly (1) or indirectly (0). Regarding

marital status, if single all MS indicators are 0, while MS2 = 1 indicates

married, MS3 = 1 indicates separated, and MS4 = 1 indicates divorced.

a.

3

points: In cell N4, use Excel’s “Correlation” Data Analysis tool to construct a

correlation matrix for all the variables. Note that the categories in columns I

and J should not be included since the data are already represented as dummy

variables in columns E through H.

b.

3

points: Considering the correlation between self-esteem and each x variable

identify the three variables that, based on correlation with y alone, should be

considered as best candidates for inclusion in the model. Shade the appropriate

cells containing the correlation values in yellow. Ignore any multicollinearity

concerns for this part.

c.

3

points: With cell N19 as the upper left hand corner of the output, fit the full

regression model. (Do not include a residual plot)

d.

4

points: Considering the regression output from part c, shade (in yellow) the

name of any x variable that appears significant and should remain in the model.

Also shade the t stat and p-value. Consider the p-value small if it is less

than 0.05.

e.

3

points: Partial Regression Model: With cell N51 the upper left hand corner of

the output, fit the model including only the x variable(s) that were found to

be significant in part d. (Do not include a residual plot)

f.

4

points: Comment on the impact of each independent variable in the partial

regression model from part e on Self Esteem. Provide your comments in Cells

N66:T78.

1

Question

2 (15 points)

A bank must prepare

for a discrimination suit filed on behalf of female employees that claim

females are paid less than male employees. The bank manager sampled employee

files to see if he could build a useful model for predicting salary as a

function of gender and other characteristics. For each employee, the data includes

salary (y, in thousands of dollars), years experience (YrsExp), years prior

experience (YrsPrior), and Gender. The data is in the spreadsheet named Bank.

a.

3

points: Since Gender is a categorical variable, construct the appropriate dummy

variable in column E to indicate gender as female = 1 and male = 0. You must

use an “IF” statement in the appropriate cell(s) to indicate the correct dummy

value based on gender.

b.

4

points: With cell H7 the upper left hand corner of the output, fit the full

model. (Do not include a residual plot).

c.

3

points: Based on the regression output from part b, shade (in yellow) the name

of any x variable that appears significant and should remain in the model. Also

shade the t stat and p-value.

d.

5

points: Based on your analysis, what is your assessment of the lawsuit? Does it

look promising for the bank? What additional variables need to be included in

the model to strengthen the validity of the model? Place your comments in Cells

H33:P45.

Question 3 (15 points)

The

trend in home building in recent years has been to emphasize open spaces and

great rooms, rather than smaller living rooms and family rooms. A home builder

has been building such homes, but his homes had been taking many months to sell

and selling for substantially less than the asking price. In order to determine

what types of homes would attract residents; the builder contacted your team.

The spreadsheet named Builder contains the sales price y, square

footage X1, number of rooms X2, number of bedrooms X3, and age X4 for each of

63 single-family residences recently sold. Perform a regression analysis of the

data.

a.

5

points: With cell I5 as the upper left hand corner of the output, fit the full

regression model. Identify the significant coefficients.

b.

10

points: Make recommendations to the builder in Cells I30:Q43 on how to increase

sales price by adjusting house layout. For example, will it be more profitable

to build a house with smaller living room given a fixed square footage? How

about smaller family room or a larger great room? Will it be wise to increase

the number of bedrooms?

Question 4 (20 points)

A company sells

products in several sales territories, each of which is assigned to a single

sales rep. The spreadsheet named Sales contains 25 observations

on eight independent variables. A regression analysis needs to be conducted to

determine whether a variety of predictor variables could explain sales in each

territory. To compute all possible regression models, we could develop 8

one-variable equations, 28 two variable regression equations, and so on. As a

matter of fact, a total of 255 different estimated regression equations can be

modeled. Next try three advanced regression techniques in StatTools. Enter 0.05

in the p-value to Enter or Leave box.

Variable Definition

Sales Total sales credited to the sales

rep

Time Length of time employed in month

Poten Market potential; total industry

sales in units for the sales territory

AdvExp Advertising expenditure in the

sales territory

Share Market share; weighted average

for the past four years

Change Change in the market share over

the previous four years

Accounts Number of accounts assigned to

the sale rep

2

Work Workload; a weighted index based

on annual purchases and concentrations of

accounts

Rating Sales rep overall rating on eight

performance dimension; an aggregate rating on a

1-7 scale

a.

5

points: With Cell K4 as the upper left hand corner of the output, fit the full

regression model (report Regression Table only) using the Stepwise Regression

function in StatTools.

b.

5

points: With Cell K15 as the upper left hand corner of the output, fit the full

regression model (report Regression Table only) using the Forward Selection

function in StatTools.

c.

5

points: With Cell K26 as the upper left hand corner of the output, fit the full

regression model (report Regression Table only) using the Backward Elimination

function in StatTools.

d.

5

points: Did the three different methods reach the same model? If not, apply

your managerial judgment to choose one model that you think is the best. State

your reasons. Place your answers in Cells K37:T48.

Question 5 (30 points)

To measure value, Consumer

Reports developed a statistic referred as a value score. The value score is

based upon five-year owner costs, overall road-test scores, and

predicted-reliability ratings. Five-year owner costs are based upon the

expenses incurred in the first five years of ownership, including depreciation,

fuel, maintenance and repairs, and so on. Using a national average of 12,000

miles per year, an average cost per mile driven is used as the measure of

five-year owner costs. Road-test scores are the results of more than 50 tests

and evaluations and are based on a 100-point scale, with higher score

indicating better performance, comfort, convenience, and fuel economy. The

highest road-test score obtained in the tests conducted by Consumer Reports was

a 99 for a Lexus LS 460L. Predicted-reliability ratings (1=Poor, 2=Fair,

3=Good, 4=Very Good, and 5=Excellent) are based upon data from Consumer

Reports’ Annual Auto Survey. A car with a value score of 1.0 is considered to

be an “average-value” car. A car with a value score of 2.0 is considered to be

twice as good a value as a car with a value score of 1.0; a car with a value

score of 0.5 is considered half as good as average; and so on. The data for

three sizes of cars (13 small sedans, 20 family sedans, and 21 upscale sedans),

including price ($) of each car tested, are provided in spreadsheet Car

(Consumer Reports, April, 2012).

a.

5

points: To incorporate the effect of size of a car, create two dummy variables

in Columns H and I using the following coding.

Family-Sedan:

1 if the car is a family sedan and 0 otherwise. Upscale-Sedan: 1 if the car is

an upscale sedan and 0 otherwise.

b.

7.5

points: First treating Cost/Mile as the dependent variable, develop an

estimated regression with Family-Sedan and Upscale-Sedan as the independent

variables. With Cell K3 as the upper left hand corner of the output, fit the

full regression model (report Regression Table only). Summarize your model in

Cells K12:S20. Are the coefficients significant? Interpret the coefficients.

c.

7.5

points: Next treating Value Score as the dependent variable, develop an

estimated regression equation using all other variables as the independent

variables. Choose Stepwise as the Regression Type. With Cell K26 as the upper

left hand corner of the output, fit the full regression model (report

Regression Table only). Does your model support the claim that “smaller car

provide better values than larger cars?” The Small Sedans represent the

smallest type of car and the Upscale Sedans represent the largest type of car.

State your reasons in Cells K39:S47.

d.

5 points: Using the

model developed in part c, calculated the predicted Value Score in Column J.

Which car has the smallest Value Score residual and which car has the largest

Value Score residual?

e.

5

points: Place the Residual Plot for the model developed in part c in Cells

K62:S78. Are the assumptions being satisfied? Is the variance of the error

terms constant? Are residuals scattered randomly around zero? Place your assessments

in Cells K80:S88.

e.

in

in the model. Also shade the t stat and p-value.

3

**20 %**discount on an order above

**$ 40**

Use the following coupon code:

LOBSTER