Define Classification. Discuss the issues in Classification.

Age	Competition	Type	Profit
Old	Yes	Software	Down
Old	No	Software	Down
Old	No	Hardware	Down
Mid	Yes	Software	Down
Mid	Yes	Hardware	Down
Mid	No	Hardware	Up
Mid	No	Software	Up
New	Yes	Software	Up
New	No	Hardware	Up
New	No	Software	Up

106views

written 3.0 years ago by

teamques10 ★ 68k

Classification :

Classification predicts categorical class labels and classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and use it in classifying new data. It is supervised learning i.e., the training data (observations, measurements, etc are accomplished by labels indicating the class of the observations.

Issues regarding classification :

Issues regarding preprocessing the data for classification are as follows :

1.Preparing the data for Classification :

The following preprocessing steps may be applies to the data to improve the accuracy, efficiency, and scalability of the classification process.

Data Cleaning : This refers to the preprocessing of data in order to remove or reduce noise and the treatment of missing values.
Relevance Analysis (feature selection) : Remove the irrelevant or redundant attributes.
Data transformation and reduction : The data may be transformed by normalization, particularly when neural networks or methods involving distance measurements are used in the learning step. Normalization involves scaling all values for a given attribute so that they fall within a small specified range, such as -1:0 to 1:0, or 0:0 to 1:0.

2.Comparing Classification and Prediction Methods :

Classification and prediction methods can be compared and evaluated according to the following criteria :

Accuracy : The accuracy of a classifier refers to the ability of a given classifier to correctly predict the class label of new or previously unseen data (i.e., tuples without class label information). Similarly. the accuracy of a predictor refers to how well a given predictor can guess the value of the predicted attribute for new or previously unseen data.
Speed : This refers to the computational costs involved in generating and using the given classifier or predictor.
Robustness : This is the ability of the classifier or predictor to make correct predictions given noisy data or data with missing values.
Scalability : This refers to the ability to construct the classifier or predictor efficiently given large amounts of data
Interpretability : This refers to the level of understanding and insight that is provided by the classifier or predictor. Interpretability is subjective and therefore more difficult to assess.

ADD COMMENT EDIT