A Further Comparison of Simplification Methods for Decision-Tree Induction

Donato Malerba, Floriana Esposito, and Giovanni Semeraro
Dipartimento di Informatica - Universit`a degli Studi di Bari
via Orabona, 4 - 70126 Bari - Italy
malerbad | esposito | semeraro @ vm.csata.it

Abstract: This paper presents an empirical investigation of eight well-known simplification methods for decision trees induced from training data. Twelve data sets are considered to compare both the accuracy and the complexity of simplified trees. The computation of optimally pruned trees is used in order to give a clear definition of bias of the methods towards overpruning and underpruning. The results indicate that the simplification strategies which exploit an independent pruning set do not perform better than the others. Furthermore, some methods show an evident bias towards either underpruning or overpruning.