KDD Process (Knowledge Discovery in Database):
- The term KDD refers to the broad process of finding knowledge in data, and emphasizes the high level application of particular data mining methods.
- The goal of the KDD process is to extract knowledge from data in the context of large databases.
- The overall process of finding and interpreting patterns from data involves the repeated application of the following steps:
Developing an understanding of:
- The application domain
- The relevant prior knowledge
- The goals of end user
Creating a target data set:
- Selecting a data set or focusing on a subset of variables or data samples on which discovery is to be performed.
Data cleaning and preprocessing:
- Removal of noise or outliers.
- Strategies for handling missing data fields.
Data reduction and projection:
- Finding useful features to represent the data depending on the goal of the task.
Choosing the data mining task:
- Deciding whether the goal of the KDD process is classification, regression, clustering, etc.
Choosing the data mining algorithm:
- Selecting methods to be used for searching the pattern in the data.
- Deciding which models and parameters may be appropriate.
- Matching a particular data mining method with the overall criteria of the KDD process.
Data mining:
- Searching for patterns of interest in a particular representational form or a set of such representations as classification rules or tress, regression, clustering, and so forth.
Interpreting mined patterns
Consolidating discovered knowledge
Architecture of Typical Data mining system
- Architecture of a typical data mining system may have the following major components as shown in fig:
Database, data warehouse, or other information repository:
- This is information repository.
- Data cleaning and data integration techniques may be performed on the data.
Databases or data warehouse server:
- It fetches the data as per the users’ requirement which one need for data mining task.
Knowledge base:
- This is used to guide the search, and gives the interesting and hidden patterns from data.
Data mining engine:
- It performs the data mining task such as characterization, association, classification, cluster analysis etc.
Pattern evaluation module:
- It is integrated with the mining module and it give the search of only the interesting patterns.
Graphical user interface:
- This module is used to communicate between user and the data mining system and allow users to browse databases or data warehouse schemas.