written 2.8 years ago by |
Naive Bayes algorithm : -
Assumption -
The naive Bayes algorithm is based on the following assumptions:
All the features are independent and are unrelated to each other. Presence or absence of a feature does not influence the presence or absence of any other feature.
The data has class-conditional independence, which means that events are independent so long as they are conditioned on the same class value.
These assumptions are, in general, true in many real world problems. It is because of these assumptions, the algorithm is called a naive algorithm.
Basic Idea -
Suppose we have a training data set consisting of N examples having n features. Let the features be named as (F1, . . . , Fn). A feature vector is of the form (f1, f2, . . . , fn). Associated with each example, there is a certain class label. Let the set of class labels be {c1, c2, . . . , cp}.
Suppose we are given a test instance having the feature vector
$$X = (x1, x2, . . . , xn).$$
We are required to determine the most appropriate class label that should be assigned to the test instance. For this purpose we compute the following conditional probabilities
$$P(c1∣X), P(c2∣X), . . . , P(cp∣X).$$
and choose the maximum among them. Let the maximum probability be P(ci∣X). Then, we choose ci as the most appropriate class label for the training instance having X as the feature vector.
The direct computation of the probabilities given in Eq.(6.5) are difficult for a number of reasons. The Bayes’ theorem can b applied to obtain a simpler method. This is explained below.
Algorithm: Naive Bayes -
Let there be a training data set having n features F1, . . . , Fn. Let f1 denote an arbitrary value of F1, f2 of F2, and so on. Let the set of class labels be {c1, c2, . . . , cp}. Let there be given a test instance having the feature vector
$$X = (x1, x2, . . . , xn).$$
We are required to determine the most appropriate class label that should be assigned to the test instance.
Compute the probabilities P(ck) for k = 1, . . . , p.
Form a table showing the conditional probabilitiesP(f1∣ck), P(f2∣ck), . . . , P(fn∣ck) for all values of f1, f2, . . . , fn and for k = 1, . . . , p.
Compute the products qk = P(x1∣ck)P(x2∣ck)⋯P(xn∣ck)P(ck) for k = 1, . . . , p.
Find j such qj = max{q1, q2, . . . , qp}.
Assign the class label cj to the test instance X.
Remarks -
In the above algorithm, Steps 1 and 2 constitute the learning phase of the algorithm. The remaining steps constitute the testing phase. For testing purposes, only the table of probabilities is required; the original data set is not required.
The various probabilities in the above expression are computed as follows:
$$P(c_k)\ =\ \frac{No.\ of\ examples\ with\ class\ label\ c_k}{Total\ number\ of\ examples}$$
$$P(x_j ∣c_k) = \frac{No.\ of\ examples\ with\ jth\ feature\ equal\ to\ xj\ and\ class\ label\ c_k}{No.\ of\ examples\ with\ class\ label\ c_k}$$