Classification :
Classification predicts categorical class labels and classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and use it in classifying new data. It is supervised learning i.e., the training data (observations, measurements, etc are accomplished by labels indicating the class of the observations.
Issues regarding classification :
Issues regarding preprocessing the data for classification are as follows :
1.Preparing the data for Classification :
The following preprocessing steps may be applies to the data to improve the accuracy, efficiency, and scalability of the classification process.
- Data Cleaning : This refers to the preprocessing of data in order to
remove or reduce noise and the treatment of missing values.
- Relevance Analysis (feature selection) : Remove the irrelevant or
redundant attributes.
- Data transformation and reduction : The data may be transformed by
normalization, particularly when neural networks or methods involving
distance measurements are used in the learning step. Normalization
involves scaling all values for a given attribute so that they fall
within a small specified range, such as -1:0 to 1:0, or 0:0 to 1:0.
2.Comparing Classification and Prediction Methods :
Classification and prediction methods can be compared and evaluated according to the following criteria :
- Accuracy : The accuracy of a classifier refers to the ability of a
given classifier to correctly predict the class label of new or
previously unseen data (i.e., tuples without class label
information). Similarly. the accuracy of a predictor refers to how
well a given predictor can guess the value of the predicted
attribute for new or previously unseen data.
- Speed : This refers to the computational costs involved in generating
and using the given classifier or predictor.
- Robustness : This is the ability of the classifier or predictor to
make correct predictions given noisy data or data with missing
values.
Scalability : This refers to the ability to construct the classifier
or predictor efficiently given large amounts of data
Interpretability : This refers to the level of understanding and
insight that is provided by the classifier or predictor.
Interpretability is subjective and therefore more difficult to
assess.