Problems
- (15 points) Textbook problem 11.7. The book mentioned R, but you can use Python package for this problem as Python is the languge for this course this time.
- (20 points) Build a decision tree with a depth of 50 for the following dataset. In addition, build a random forest classifier with 100 estimators and a depth of 50 for the following dataset EEG Eye State:
- (10 points) Why does random forest perform better than the decision tree?
- (10 points) Can we change the depth to the extent that decision tree performs much better than random forest? Explain your reasoning with solid evidence.
- (15 points) Build a decision tree with a depth of 50 for the following dataset. In addition, build a random forest classifier with 100 estimators and a depth of 50 for the following dataset Haberman's Survival:
- (7 points) Why does the dataset work poorly with decision trees and random forests?
- (8 points) Can we clearly say if one of these is better than the other? Why or why not?
- (Extra 5 points) Build two other classfiers with the same training/evaluation set split as in Problem 2 for that same data set, then do cross-validation with fold=5, finally demonstrate the best model's confusion matrix and accuracy.
--------------NOTES--------------
In each of the problems, for each of the classifiers please show the confusion matrix of your best model. Please use an 80-20 training/evaluation set split of your data in all questions. For example, this means that 80% of your data will be used for fitting/training your random forest classifier, and 20% of your data will be used to determine the accuracy and the class confusion matrix of your classifier. The ordering of this split is left up to you.