librelist archives

« back to archive

Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?

Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?

From:
Pablo Duboue
Date:
2014-12-15 @ 22:18
Muy interesante:

http://jmlr.csail.mit.edu/papers/v15/delgado14a.html

Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim;
15(Oct):3133−3181, 2014.

Abstract

We evaluate 179 classifiers arising from 17 families (discriminant
analysis, Bayesian, neural networks, support vector machines, decision
trees, rule-based classifiers, boosting, bagging, stacking, random
forests and other ensembles, generalized linear models,
nearest-neighbors, partial least squares and principal component
regression, logistic and multinomial regression, multiple adaptive
regression splines and other methods), implemented in Weka, R (with
and without the caret package), C and Matlab, including all the
relevant classifiers available today. We use 121 data sets, which
represent the whole UCI data base (excluding the large- scale
problems) and other own real problems, in order to achieve significant
conclusions about the classifier behavior, not dependent on the data
set collection. The classifiers most likely to be the bests are the
random forest (RF) versions, the best of which (implemented in R and
accessed via caret) achieves 94.1% of the maximum accuracy overcoming
90% in the 84.3% of the data sets. However, the difference is not
statistically significant with the second best, the SVM with Gaussian
kernel implemented in C using LibSVM, which achieves 92.3% of the
maximum accuracy. A few models are clearly better than the remaining
ones: random forest, SVM with Gaussian and polynomial kernels, extreme
learning machine with Gaussian kernel, C5.0 and avNNet (a committee of
multi-layer perceptrons implemented in R with the caret package). The
random forest is clearly the best family of classifiers (3 out of 5
bests classifiers are RF), followed by SVM (4 classifiers in the
top-10), neural networks and boosting ensembles (5 and 3 members in
the top-20, respectively).