Using default settings,

| October 14, 2019


Using default settings, t a decision tree to the training set predict the credit ratings of
customers using all of the other variables in the dataset.
a. Report the resulting tree.
b. Based on this output, predict the credit rating of a hypothetical \median customer”,
i.e., one with the attributes listed in Table 1, showing the steps involved.
c. Produce the confusion matrix for predicting the credit rating from this tree on the
test set, and also report the overall accuracy rate.
1
d. What is the numerical value of the gain in entropy corresponding to the rst split at
the top of the tree? (Use logarithms to base 2, and show the details of the calculation
rather than just providing a nal answer.)
e. Fit a random forest model to the training set to try to improve prediction. Report
the R output.
f. Produce the confusion matrix for predicting the credit rating from this forest on the
test set, and also report the overall accuracy rate.
3. Using default settings, for svm() from the e1071 package, t a support vector machine to
predict the credit ratings of customers using all of the other variables in the dataset.
a. Predict the credit rating of a hypothetical \median customer”, i.e., one with the
attributes listed in Table 1. Report decision values as well.
b. Produce the confusion matrix for predicting the credit rating from this SVM on the
test set, and also report the overall accuracy rate.
c. Automatically or manually tune the SVM to improve prediction over that found
in 3(b). Report the resulting SVM settings and the resulting confusion matrix for
predicting the test set. (Any amount of improvement is acceptable.)
4. Fit the Naive Bayes model to predict the credit ratings of customers using all of the other
variables in the dataset.
a. Predict the credit rating of a hypothetical \median customer”, i.e., one with the
attributes listed in Table 1. Report predicted probabilities as well.
b. Produce the confusion matrix for predicting the credit rating using Naive Bayes on
the test set, and also report the overall accuracy rate.
c. Reproduce the rst 20 or so lines of the R output for the Naive Bayes t, and use
them to explain how you would make this prediction.
5. Based on the confusion matrices reported in the preceding parts,
a. Which of the classi ers look to be the best? (Be speci c, and specify the gures you
used to answer this question.)
b. Which look to be the worst? (Be speci c, and specify the gures you used to answer
this question.)
c. Are there any categories that all classi ers seem to have trouble with?
6. Consider a simpler problem of predicting whether a customer gets a credit rating of A
or not.
a. Fit a logistic regression model to predict the credit ratings of customers using all of
the other variables in the dataset, with no interactions.
b. Report the summary table of the logistic regression model t.
c. Which predictors of credit rating appear to be signi cant? Which of them are likely
to be spuriously so?
d. Fit an SVM model of your choice to the training set.
e. Produce an ROC chart comparing the logistic regression and the SVM results of
predicting the test set. Comment on any di erences in their performance.

Order your essay today and save 20% with the discount code: ESSAYHELP
Order your essay today and save 20% with the discount code: ESSAYHELPOrder Now