CONSTRAINT BASED ASSOCIATION RULES:
- A data mining process may uncover thousands of rules from a given set of data, most of which end up being unrelated or uninteresting to the users.
- Often, users have a good sense of which “direction” of mining may lead to interesting patterns and the “form” of the patterns or rules they would like to find.
- Thus, a good heuristic is to have the users specify such intuition or expectations as constraints to confine the search space.
- This strategy is known as constraint-based mining.
- Constraint based mining provides
- User Flexibility: provides constraints on what to be mined.
- System Optimization: explores constraints to help efficient mining.
- The constraints can include the following:
- Knowledge type constraints: These specify the type of knowledge to be mined, such as association or correlation.
- Data constraints:These specify the set of task-relevant data.
- Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, or levels of the concept hierarchies, to be used in mining.
- Interestingness constraints:
These specify thresholds on statistical measures of rule interestingness, such as support, confidence, and correlation.
- Rule constraints: These specify the form of rules to be mined. Such constraints may be expressed as rule templates, as the maximum or minimum number of predicates that can occur in the rule antecedent or consequent, or as relationships among attributes, attribute values, and/or aggregates.
The above constraints can be specified using a high-level declarative data mining query language and user interface.
Constraint based association rules:
- In order to make the mining process more efficient rule based constraint mining :
- allows users to describe the rules that they would like to uncover.
- provides a sophisticated mining query optimizer that can be used to exploit the constraints specified by the user.
- encourages interactive exploratory mining and analysis.
Constrained frequent pattern mining: Query optimization approach
- Given a frequent pattern mining query with a set of constraints C, the algorithm should be:
- Sound: it only finds frequent sets that satisfy the given constraints C.
- Complete: all frequent sets satisfying the given constraints are found .
- A naïve solution:
- Find all frequent sets and then test them for constraint satisfaction.
- More efficient approaches:
- Analyze the properties of constraints comprehensively.
- Push them as deeply as possible inside the frequent pattern computation.