0
1.2kviews
Write a short note on Random Forests
1 Answer
0
44views


Random Forests : -

A random forest is an ensemble learning method where multiple decision trees are constructed and then they are merged to get a more accurate prediction.



Algorithm : -

Here is an outline of the random forest algorithm.

  1. The random forests algorithm generates many classification trees. Each tree is generated as follows:

    (a) If the number of examples in the training set is N, take a sample of N examples at random - but with replacement, from the original data. This sample will be the training set for generating the tree.

    (b) If there are M input variables, a number m is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the generation of the various trees inthe forest.

    (c) Each tree is grown to the largest extent possible.

  2. To classify a new object from an input vector, put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree “votes” for that class. The forest chooses the classification.



Strengths and weaknesses :-


Strengths -

The following are some of the important strengths of random forests.

  • It runs efficiently on large data bases.

  • It can handle thousands of input variables without variable deletion.

  • It gives estimates of what variables are important in the classification.

  • It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.

  • Generated forests can be saved for future use on other data.

  • Prototypes are computed that give information about the relation between the variables and the classification.

  • The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.

  • It offers an experimental method for detecting variable interactions.

  • Random forest run times are quite fast, and they are able to deal with unbalanced and missing data.

  • They can handle binary features, categorical features, numerical features without any need for scaling.

  • There are lots of excellent, free, and open-source implementations of the random forest algorithm. We can find a good implementation in almost all major ML libraries and toolkits.



Weaknesses -

  • A weakness of random forest algorithms is that when used for regression they cannot predict beyond the range in the training data, and that they may over-fit data sets that are particularly noisy.

  • The sizes of the models created by random forests may be very large. It may take hundreds of megabytes of memory and may be slow to evaluate.

  • Random forest models are black boxes that are very hard to interpret.

Please log in to add an answer.