# M3 Group Assignment Instructions

| October 3, 2018

M3
Group Assignment Instructions

Complete
the Assignment, name it as GroupXX _Assign3.xls (where XX is your Group Name),
and upload and submit to the instructor through Dropbox. Do not enter

You
must complete this assignment without the assistance of persons other than the
members of your Group. You may use any other resources you deem necessary.
Answer the questions below by placing the appropriate graph and/or answers in
the designated cells of the spreadsheet.
DO NOT CHANGE
THE APPEARANCE OR FUNCTIONALITY OF THE
UNLESS INSTRUCTED TO DO SO.
.jpg”>

Question 1 (20 points)

A mental health agency
measured the self-esteem score for randomly selected individuals with
disabilities who were involved in some work activity within the past year. The
spreadsheet named Self Esteem provides the data including each
individuals self-esteem measure (y), years of education (YrsEdu), age, months
worked in the last 6 years (MonWork), marital status dummy variables (MS2, MS3,
MS4) indicating if the individual is single, married, separated, or divorced,
and a support level (SL) dummy variable indicating if the level of job support
(counseling, etc) was provided directly (1) or indirectly (0). Regarding
marital status, if single all MS indicators are 0, while MS2 = 1 indicates
married, MS3 = 1 indicates separated, and MS4 = 1 indicates divorced.

a.
3
points: In cell N4, use Excel’s “Correlation” Data Analysis tool to construct a
correlation matrix for all the variables. Note that the categories in columns I
and J should not be included since the data are already represented as dummy
variables in columns E through H.

b.
3
points: Considering the correlation between self-esteem and each x variable
identify the three variables that, based on correlation with y alone, should be
considered as best candidates for inclusion in the model. Shade the appropriate
cells containing the correlation values in yellow. Ignore any multicollinearity
concerns for this part.

c.
3
points: With cell N19 as the upper left hand corner of the output, fit the full
regression model. (Do not include a residual plot)
d.
4
points: Considering the regression output from part c, shade (in yellow) the
name of any x variable that appears significant and should remain in the model.
Also shade the t stat and p-value. Consider the p-value small if it is less
than 0.05.

e.
3
points: Partial Regression Model: With cell N51 the upper left hand corner of
the output, fit the model including only the x variable(s) that were found to
be significant in part d. (Do not include a residual plot)
f.
4
points: Comment on the impact of each independent variable in the partial
regression model from part e on Self Esteem. Provide your comments in Cells
N66:T78.

1

Question
2 (15 points)

A bank must prepare
for a discrimination suit filed on behalf of female employees that claim
females are paid less than male employees. The bank manager sampled employee
files to see if he could build a useful model for predicting salary as a
function of gender and other characteristics. For each employee, the data includes
salary (y, in thousands of dollars), years experience (YrsExp), years prior
experience (YrsPrior), and Gender. The data is in the spreadsheet named Bank.
a.
3
points: Since Gender is a categorical variable, construct the appropriate dummy
variable in column E to indicate gender as female = 1 and male = 0. You must
use an “IF” statement in the appropriate cell(s) to indicate the correct dummy
value based on gender.

b.
4
points: With cell H7 the upper left hand corner of the output, fit the full
model. (Do not include a residual plot).
c.
3
points: Based on the regression output from part b, shade (in yellow) the name
of any x variable that appears significant and should remain in the model. Also
shade the t stat and p-value.
d.
5
points: Based on your analysis, what is your assessment of the lawsuit? Does it
look promising for the bank? What additional variables need to be included in
the model to strengthen the validity of the model? Place your comments in Cells
H33:P45.

Question 3 (15 points)

The
trend in home building in recent years has been to emphasize open spaces and
great rooms, rather than smaller living rooms and family rooms. A home builder
has been building such homes, but his homes had been taking many months to sell
and selling for substantially less than the asking price. In order to determine
what types of homes would attract residents; the builder contacted your team.
The spreadsheet named Builder contains the sales price y, square
footage X1, number of rooms X2, number of bedrooms X3, and age X4 for each of
63 single-family residences recently sold. Perform a regression analysis of the
data.

a.
5
points: With cell I5 as the upper left hand corner of the output, fit the full
regression model. Identify the significant coefficients.
b.
10
points: Make recommendations to the builder in Cells I30:Q43 on how to increase
sales price by adjusting house layout. For example, will it be more profitable
to build a house with smaller living room given a fixed square footage? How
about smaller family room or a larger great room? Will it be wise to increase
the number of bedrooms?

Question 4 (20 points)

A company sells
products in several sales territories, each of which is assigned to a single
sales rep. The spreadsheet named Sales contains 25 observations
on eight independent variables. A regression analysis needs to be conducted to
determine whether a variety of predictor variables could explain sales in each
territory. To compute all possible regression models, we could develop 8
one-variable equations, 28 two variable regression equations, and so on. As a
matter of fact, a total of 255 different estimated regression equations can be
modeled. Next try three advanced regression techniques in StatTools. Enter 0.05
in the p-value to Enter or Leave box.

Variable Definition

Sales Total sales credited to the sales
rep
Time Length of time employed in month
Poten Market potential; total industry
sales in units for the sales territory
sales territory
Share Market share; weighted average
for the past four years
Change Change in the market share over
the previous four years
Accounts Number of accounts assigned to
the sale rep

2

Work Workload; a weighted index based
on annual purchases and concentrations of
accounts
Rating Sales rep overall rating on eight
performance dimension; an aggregate rating on a
1-7 scale

a.
5
points: With Cell K4 as the upper left hand corner of the output, fit the full
regression model (report Regression Table only) using the Stepwise Regression
function in StatTools.
b.
5
points: With Cell K15 as the upper left hand corner of the output, fit the full
regression model (report Regression Table only) using the Forward Selection
function in StatTools.

c.
5
points: With Cell K26 as the upper left hand corner of the output, fit the full
regression model (report Regression Table only) using the Backward Elimination
function in StatTools.
d.
5
points: Did the three different methods reach the same model? If not, apply
your managerial judgment to choose one model that you think is the best. State

Question 5 (30 points)
To measure value, Consumer
Reports developed a statistic referred as a value score. The value score is
based upon five-year owner costs, overall road-test scores, and
predicted-reliability ratings. Five-year owner costs are based upon the
expenses incurred in the first five years of ownership, including depreciation,
fuel, maintenance and repairs, and so on. Using a national average of 12,000
miles per year, an average cost per mile driven is used as the measure of
five-year owner costs. Road-test scores are the results of more than 50 tests
and evaluations and are based on a 100-point scale, with higher score
indicating better performance, comfort, convenience, and fuel economy. The
highest road-test score obtained in the tests conducted by Consumer Reports was
a 99 for a Lexus LS 460L. Predicted-reliability ratings (1=Poor, 2=Fair,
3=Good, 4=Very Good, and 5=Excellent) are based upon data from Consumer
Reports’ Annual Auto Survey. A car with a value score of 1.0 is considered to
be an “average-value” car. A car with a value score of 2.0 is considered to be
twice as good a value as a car with a value score of 1.0; a car with a value
score of 0.5 is considered half as good as average; and so on. The data for
three sizes of cars (13 small sedans, 20 family sedans, and 21 upscale sedans),
including price (\$) of each car tested, are provided in spreadsheet Car
(Consumer Reports, April, 2012).

a.
5
points: To incorporate the effect of size of a car, create two dummy variables
in Columns H and I using the following coding.
Family-Sedan:
1 if the car is a family sedan and 0 otherwise. Upscale-Sedan: 1 if the car is
an upscale sedan and 0 otherwise.

b.
7.5
points: First treating Cost/Mile as the dependent variable, develop an
estimated regression with Family-Sedan and Upscale-Sedan as the independent
variables. With Cell K3 as the upper left hand corner of the output, fit the
full regression model (report Regression Table only). Summarize your model in
Cells K12:S20. Are the coefficients significant? Interpret the coefficients.

c.
7.5
points: Next treating Value Score as the dependent variable, develop an
estimated regression equation using all other variables as the independent
variables. Choose Stepwise as the Regression Type. With Cell K26 as the upper
left hand corner of the output, fit the full regression model (report
Regression Table only). Does your model support the claim that “smaller car
provide better values than larger cars?” The Small Sedans represent the
smallest type of car and the Upscale Sedans represent the largest type of car.
State your reasons in Cells K39:S47.
d.
5 points: Using the
model developed in part c, calculated the predicted Value Score in Column J.
Which car has the smallest Value Score residual and which car has the largest
Value Score residual?
e.
5
points: Place the Residual Plot for the model developed in part c in Cells
K62:S78. Are the assumptions being satisfied? Is the variance of the error
terms constant? Are residuals scattered randomly around zero? Place your assessments
in Cells K80:S88.

e.
in
in the model. Also shade the t stat and p-value.

3

Get a 20 % discount on an order above \$ 40
Use the following coupon code:
LOBSTER
Positive SSL