Discuss different steps involved in Data Preprocessing.

915views

written 7.9 years ago by

aartisahitya • 170

Steps Of data preprocessing:

1.Data cleaning: fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies.

2.Data integration: using multiple databases, data cubes, or files.

3.Data transformation: normalization and aggregation.

4.Data reduction: reducing the volume but producing the same or similar analytical results.

5.Data discretization: part of data reduction, replacing numerical attributes with nominal ones.

Normalization:

⦁ Scaling attribute values to fall within a specified range. Example: to transform V in [min, max] to V' in [0,1], apply V'=(V-Min)/(Max-Min)

⦁ Scaling by using mean and standard deviation (useful when min and max are unknown or when there are outliers): V'=(V-Mean)/StDev
Aggregation: moving up in the concept hierarchy on numeric attributes.
Generalization: moving up in the concept hierarchy on nominal attributes.
Attribute construction: replacing or adding new attributes inferred by existing attributes.

Reducing the number of attributes ⦁ Data cube aggregation: applying roll-up, slice or dice operations. ⦁ Removing irrelevant attributes: attribute selection (filtering and wrapper methods), searching the attribute space

⦁ Principle component analysis (numeric attributes only): searching for a lower dimensional space that can best represent the data..
Reducing the number of attribute values

⦁ Binning (histograms): reducing the number of attributes by grouping them into intervals (bins).

⦁ Clustering: grouping values in clusters.

⦁ Aggregation or generalization
Reducing the number of tuples

⦁ Sampling

ADD COMMENT EDIT