written 8.7 years ago by | modified 2.4 years ago by |
Similar questions
What do you mean by pre-processing? Why is it required?
written 8.7 years ago by | modified 2.4 years ago by |
Similar questions
What do you mean by pre-processing? Why is it required?
written 8.7 years ago by |
• Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.
• Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors.
• Data preprocessing is a proven method of resolving such issues.
Data in real world is :
• Incomplete:
o Data is incomplete because of lacking attribute values, lacking certain attributes of interest, or containing only aggregate data.
o eg: occupation=””.
• Noisy/Dirty:
o Noisy data contains errors or outliers
o eg: salary=”-10”.
• Inconsistent: Inconsistent data contains discrepancies (an illogical or surprising lack of compatibility or similarity between two or more facts.) in codes or names.
o Eg: Age=”43” Birthday=”07/04/1996”.
o Eg: Was rating “1, 2, 3”, now rating “A, B, C”.
Reasons preprocessing is required:
• Real-world data tend to be dirty, incomplete, and inconsistent.
• Data preprocessing techniques can improve the quality of the data, thereby helping to improve the accuracy and efficiency of the mining process.
• Data preprocessing makes quality decisions based on quality data.
• Data preprocessing detects data anomalies, rectifies them early, and reduces the data to be analyzed thus leading to huge payoffs for decision making.
Data Pre-processing is important as:
• Data warehouse needs consistent integration of Quality data.
• If there is no quality data, there will be no quality mining results.
Eg: duplicate or inconsistent data may cause incorrect or even misleading statistics.