0
2.1kviews
Define Classification. Discuss the issues in Classification.

A simple example from the stock market involving only discrete ranges has profit as categorical attribute, with values {Up, Down) and the training data is:

Age Competition Type Profit
Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up

Apply decision tree algorithm and show the generated rules.

1 Answer
0
106views

Classification :

Classification predicts categorical class labels and classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and use it in classifying new data. It is supervised learning i.e., the training data (observations, measurements, etc are accomplished by labels indicating the class of the observations.

Issues regarding classification :

Issues regarding preprocessing the data for classification are as follows :

1.Preparing the data for Classification :

The following preprocessing steps may be applies to the data to improve the accuracy, efficiency, and scalability of the classification process.

  • Data Cleaning : This refers to the preprocessing of data in order to remove or reduce noise and the treatment of missing values.
  • Relevance Analysis (feature selection) : Remove the irrelevant or redundant attributes.
  • Data transformation and reduction : The data may be transformed by normalization, particularly when neural networks or methods involving distance measurements are used in the learning step. Normalization involves scaling all values for a given attribute so that they fall within a small specified range, such as -1:0 to 1:0, or 0:0 to 1:0.

2.Comparing Classification and Prediction Methods :

Classification and prediction methods can be compared and evaluated according to the following criteria :

  • Accuracy : The accuracy of a classifier refers to the ability of a given classifier to correctly predict the class label of new or previously unseen data (i.e., tuples without class label information). Similarly. the accuracy of a predictor refers to how well a given predictor can guess the value of the predicted attribute for new or previously unseen data.
  • Speed : This refers to the computational costs involved in generating and using the given classifier or predictor.
  • Robustness : This is the ability of the classifier or predictor to make correct predictions given noisy data or data with missing values.
  • Scalability : This refers to the ability to construct the classifier or predictor efficiently given large amounts of data

  • Interpretability : This refers to the level of understanding and insight that is provided by the classifier or predictor. Interpretability is subjective and therefore more difficult to assess.

Please log in to add an answer.