Homework 10

Due Sat. Apr. 17 at 9:00pm

Homework policies and submission instructions


  1. (15 points) Textbook problem 11.7. The book mentioned R, but you can use Python package for this problem as Python is the languge for this course this time.
  2. (20 points) Build a decision tree with a depth of 50 for the following dataset. In addition, build a random forest classifier with 100 estimators and a depth of 50 for the following dataset EEG Eye State:
    1. (10 points) Why does random forest perform better than the decision tree?
    2. (10 points) Can we change the depth to the extent that decision tree performs much better than random forest? Explain your reasoning with solid evidence.
  3. (15 points) Build a decision tree with a depth of 50 for the following dataset. In addition, build a random forest classifier with 100 estimators and a depth of 50 for the following dataset Haberman's Survival:
    1. (7 points) Why does the dataset work poorly with decision trees and random forests?
    2. (8 points) Can we clearly say if one of these is better than the other? Why or why not?
  4. (Extra 5 points) Build two other classfiers with the same training/evaluation set split as in Problem 2 for that same data set, then do cross-validation with fold=5, finally demonstrate the best model's confusion matrix and accuracy.


In each of the problems, for each of the classifiers please show the confusion matrix of your best model. Please use an 80-20 training/evaluation set split of your data in all questions. For example, this means that 80% of your data will be used for fitting/training your random forest classifier, and 20% of your data will be used to determine the accuracy and the class confusion matrix of your classifier. The ordering of this split is left up to you.