written 6.9 years ago by | modified 2.8 years ago by |
Subject: Datawarehouse and Mining
Topic: Classification
Difficulty: Medium
written 6.9 years ago by | modified 2.8 years ago by |
Subject: Datawarehouse and Mining
Topic: Classification
Difficulty: Medium
written 2.8 years ago by |
Bagging :-
Bagging is a voting method whereby base-learners are made different by training them over slightly different training sets.
Generating L slightly different samples from a given sample is done by bootstrap, where given a training set X of size N, we draw N instances randomly from X with replacement. Because sampling is done with replacement, it is possible that some instances are drawn more than once and that certain instances are not drawn at all. When this is done to generate L samples Xj , j = 1, . . . , L, these samples are similar because they are all drawn from the same original sample, but they are also slightly different due to chance.
The base-learners are trained with these L samples Xj . A learning algorithm is an unstable algorithm if small changes in the training set causes a large difference in the generated learner. Bagging, short for bootstrap aggregating, uses bootstrap to generate L training sets, trains L base?learners using an unstable learning procedure and then during testing, takes an average. Bagging can be used both for classification and regression. In the case of regression, to be more robust, one can take the median instead of the average when combining predictions.
Algorithms such as decision trees and multilayer perceptrons are unstable.