written 2.8 years ago by |
Boosting : -
In bagging, generating complementary base-learners is left to chance and to the unstability of the learning method. In boosting, we actively try to generate complementary base-learners by training the next learner on the mistakes of the previous learners. The original boosting algorithm combines three weak learners to generate a strong learner. A weak learner has error probability less than 1/2, which makes it better than random guessing on a two-class problem, and a strong learner has arbitrarily small error probability.
The boosting method -
Let $d_1, d_2, d_3$ be three learning algorithms for a particular task. Let a large training set X be given.
We randomly divide X into three sets, say $X_1, X_2, X_3$.
We use $X_1\ and\ train\ d_1$.
We then take $X_2$ and feed it to $d_1$.
We take all instances misclassified by d1 and also as many instances on which d1 is correct from X2, and these together form the training set of d2.
We then take $X_3$ and feed it to $d_1\ and\ d_2$.
The instances on which $d_1\ and\ d_2$ disagree form the training set of $d_3$.
During testing, given an instance, we give it to $d_1\ and\ d_2$ if they agree, that is the response; otherwise the response of $d_3$ is taken as the output.
It has been shown that this overall system has reduced error rate, and the error rate can arbitrarily be reduced by using such systems recursively. One disadvantage of the system is thaaaaaat it requires a very large training sample. An improved algorithm known as AdaBoost (short for “adaptive boosting”), uses the same training set over and over and thus need not be large. AdaBoost can also combine an arbitrary number of base-learners, not three.