(15 points) We need to perform statistical tests to compare the performance of two learning algorithms on a given learning task. Please read the following paper and brie y summarize the key ideas as you understood:
Thomas G. Dietterich: Approximate Statistical Test For Comparing Supervised Classi cation
Learning Algorithms. Neural Computation 10(7): 1895-1923 (1998) keel/pdf/algorithm/articulo/dietterich1998.pdf
(5 points) Please read the following paper and brie y summarize the key ideas as you un-derstood:
Thomas G. Dietterich (1995) Over tting and under-computing in machine learning. Comput-ing Surveys, 27(3), 326-327.
(10 points) Please read the following paper and brie y summarize the key ideas as you understood:
Thomas G. Dietterich (2000). Ensemble Methods in Machine Learning. J. Kittler and F. Roli (Ed.) First International Workshop on Multiple Classi er Systems, Lecture Notes in Computer Science (pp. 1-15). New York: Springer Verlag.
(10 points) Please read the rst ve sections of the following paper and brie y summarize the key ideas as you understood:
Jerome Friedman (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), pp 1189{1232.
(10 points) Please read the following paper and brie y summarize the key ideas as you understood:
Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System. KDD 2016.
(25 points) Empirical analyis question. Income Classi er using Bagging and Boosting. You will use the Adult Income dataset from HW1 for this question. You can use Weka or scikit-learn software.
a. Bagging (weka.classi ers.meta.Bagging). You will use decision tree as the base supervised learner. Try trees of di erent depth (1, 2, 3, 5, 10) and di erent sizes of bag or ensemble, i.e., number of trees (10, 20, 40, 60, 80, 100). Compute the training accuracy, validation accuracy, and testing accuracy for di erent combinations of tree depth and number of trees; and plot them. List your observations.
b. Boosting (weka.classi ers.meta.AdaBoostM1). You will use decision tree as the base super-vised learner. Try trees of di erent depth (1, 2, 3) and di erent number of boosting iterations (10, 20, 40, 60, 80, 100). Compute the training accuracy, validation accuracy, and testing accuracy for di erent combinations of tree depth and number of boosting iterations; and plot them. List your observations.
(25 points) Automatic hyper-parameter tuning via Bayesian Optimization. For this home-work, you need to use BO software to perform hyper-parameter search for Bagging and Boost-ing classi ers: two hyper-parameters (size of ensemble and depth of decision tree).
You will employ Bayesian Optimization (BO) software to automate the search for the best hyper-parameters by running it for 50 iterations. Plot the number of BO iterations on x-axis and performance of the best hyper-parameters at any point of time (performance of the corresponding trained classi er on the validation data) on y-axis.
Additionally, list the sequence of candidate hyper-parameters that were selected along the BO iterations.
You can use one of the following BO softwares or others as needed. Spearmint:
