Applications of frequent itemset analysis :-
Related concepts :-
- Let items be words, and let baskets be documents (e.g., Web pages,
blogs, tweets).
- A basket/document contains those items/words that are present
in the document.
- If we look for sets of words that appear together in many
documents, the sets will be dominated by the most common words
(stop words).
- If the document contain many the stop words such as “and” and
“a” then it will consider as more frequent itemsets.
- However, if we ignore all the most common words, then we would
hope to find among the frequent pairs some pairs of words that
represent a joint concept.
Plagiarism :-
- Let the items be documents and the baskets be sentences.
- An item is in a basket if the sentence is in the document.
- This arrangement appears backwards, and we should remember
that the relationship between items and baskets is an arbitrary
many-many relationship.
- In this application, we look for pairs of items that appear together
in several baskets.
- If we find such a pair, then we have two documents that share
several sentences in common.
Biomarkers :-
- Let the items be of two types such as genes or blood proteins, and
diseases.
- Each basket is the set of data about a patient: their genome and
blood-chemistry analysis, as well as their medical history of disease.
- A frequent itemset that consists of one disease and one or more
biomarkers suggest a test for the disease.