Explain sequence mining in Transactional databases?

1.1kviews

written 9.4 years ago by

radhakrishnanraveen • 0

modified 9.4 years ago by

juilee • 100

Sequential pattern mining is trying to find relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences.
We can find the sequential patterns of specific individual items also we can find the sequential patterns across different items.

Application:

Sequential pattern mining is widely used in analyzing of DNA sequence.
Sequential pattern can be widely used in different areas, such as mining user access patterns for the web sites, using the history of symptoms to predict certain kind of disease, also by using sequential pattern mining, the retailers can make the inventory control more efficient.

Challenges on Sequential pattern mining:

A huge number of possible sequential patterns are hidden in databases.
A mining algorithm should find the complete set of patterns, be highly efficient, scalable.

Example:

Consider the database shown in fig 1 (This database has been sorted on customer-id and transaction-time) fig 2 shows this database expressed as a set of customer sequences.
With minimum support set to 25%, i.e. a minimum support of 2 customers, two sequences :< ( 30) (90)> and < (30) (40 70)> are maximal among those satisfying the support constraint, and are desired sequential patterns.
The sequential pattern < (30) (90)> is supported by customers 1 and 4.Customer 4 buys items (40 70) in between items 30 and 90, but supports the pattern < (30) (90)> since we are looking for patterns that are not necessarily contiguous. The sequential pattern <30((40 70)> is supported by customers 2 and 4. Customer 2 buys 60 along with 40 and 70, but supports this pattern since (40 70) is a subset of (40 60 70).
An example of a sequence that does not have minimum support is the sequence < (10 20)(30)>, which is only supported by customers 2. The sequences <(30)>,<(40)>.<(70)>,<(90)>,<(30)(40)>,<(30)(70)>and <(40)(70)>, though having minimum support, are not in the answer because they are not maximal.

Fig1: Database Sorted by Customer Id and Transaction Time

Customer Id	Customer Sequence
1 2 3 4 5	< (30) (90)> < (10 20)(30) (40 60 70)> <(30,50,70)> <(30)(40,70)(90)> <(90)>

Fig2: Customer Sequence Version of the Database

Sequential Patterns with support >25%
< (30) (90)> <(30)(40,70)>

Fig3: The answer set

Scalable Methods for Mining Sequential Patterns:

GSP: (A sequential Pattern Mining Algorithm Based on Candidate Generate and Test) It integrates with time constraints and relaxes the definition of transaction; also consider the knowledge of taxonomies.
SPADE: (Sequential Pattern Discovery using Equivalent Class) SPADE is an algorithm proposed to find frequent sequences using efficient lattice search techniques and simple joins.

PrefixSpan :( Mining Sequential Patterns by Prefix Projections) It mainly employs the method of database projection to make the database projection to make the database for next pass much smaller and consequently make the algorithm more speedy.

ADD COMMENT EDIT