written 5.7 years ago by |
Rabin and Karp proposed a string matching algorithm that performs well in practice and that also generalizes to other algorithms for related problems, such as two-dimensional pattern matching.
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for each M-character subsequence of given text to be compared.
If the hash values for a particular subsequence are unequal, the algorithm will calculate the hash value for next M-character sequence
If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence.
Therefore there is only one comparison per text subsequence, and Brute Force is only needed when hash values match
- Let a text string T of length n and a pattern string P of length m are given such that m<=n, then the string matching problem is to find all occurrences of P in T
Example- T = “KARNATAKA” and P=“NAT”
Applications:
• Searching keywords in a file
• Searching engines
• Database searching
Notations
T : Given text or String e.g. – “JAIPURISCALLEDPARISOFINDIA”
|T| : 26, length of T
Substring: Ti,j=TiT i+1…Tj e.g. – “PARIS”
Subsequence of T: deleting zero or more characters from T
e.g. –“RISALL” and “JISINA”
Prefix of T: T1,k e.g. –“JAIPUR” is a prefix of T. Suffix of T: Th,|T| e.g. –“INDIA” is a suffix of T.
Complexity
If a large prime number is used to calculate hash function then there a very low possibility of being hashed values equal for two different patterns
In this case searching takes O(N) time, where N is number of characters in the text body.
In worst case the time complexity may be O(MN), where M is no. of characters in the pattern. This case may occur when the prime no. chosen is very small.