C++ Program to Implement Rabin-Karp Algorithm for Pattern Matching

Pattern matching is a fundamental task in computer science, with applications ranging from text processing to bioinformatics. One efficient algorithm for pattern matching is the Rabin-Karp algorithm. Named after its inventors, Michael Rabin and Richard Karp, this algorithm uses hashing to find any one of a set of pattern strings in a text. By converting the pattern and portions of the text to numerical values, it performs comparisons based on these hash values, enabling efficient matching. This article will guide you through the implementation of the Rabin-Karp algorithm in C++ with real examples to illustrate its practical application and output.

Prerequisites

Before diving into the implementation, it’s beneficial to have:

  • Basic understanding of algorithms: Knowledge of string matching and basic algorithm principles.
  • Familiarity with C++ programming: Proficiency in C++ syntax, arrays, and functions.
  • Understanding of data structures: Specifically, arrays and their manipulation.

Rabin-Karp Algorithm Structure

Definition and Formula

The Rabin-Karp algorithm uses a hashing technique to find a pattern within a text. The key steps involve:

  1. Hashing the pattern: Calculate a hash value for the pattern.
  2. Hashing text substrings: Calculate hash values for substrings of the text.
  3. Comparing hash values: Compare the hash value of the pattern with hash values of substrings of the text.

The hash function used is typically a rolling hash, which allows efficient rehashing.

Rolling Hash Formula

    \[ H = (d^{m-1} \cdot \text{str}[0] + d^{m-2} \cdot \text{str}[1] + \ldots + d^0 \cdot \text{str}[m-1]) \mod q \]

Where d is the number of characters in the input alphabet (typically 256), and q is a prime number to minimize collisions.

1. Implementing the Rabin-Karp Algorithm

1.1 Rabin-Karp Class Definition

Below is the class definition for the Rabin-Karp algorithm, including methods for preprocessing the pattern and searching within the text.

C++
#include <iostream>
#include <string>
#include <vector>

class RabinKarp {
public:
    void search(const std::string& pattern, const std::string& text, int q);
private:
    int d = 256; // Number of characters in the input alphabet
};

void RabinKarp::search(const std::string& pattern, const std::string& text, int q) {
    int m = pattern.size();
    int n = text.size();
    int p = 0; // hash value for pattern
    int t = 0; // hash value for text
    int h = 1;

    // The value of h would be "pow(d, m-1) % q"
    for (int i = 0; i < m - 1; i++)
        h = (h * d) % q;

    // Calculate the hash value of pattern and first window of text
    for (int i = 0; i < m; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }

    // Slide the pattern over text one by one
    for (int i = 0; i <= n - m; i++) {
        // Check the hash values of current window of text and pattern
        if (p == t) {
            // Check for characters one by one
            bool match = true;
            for (int j = 0; j < m; j++) {
                if (text[i + j] != pattern[j]) {
                    match = false;
                    break;
                }
            }
            if (match)
                std::cout << "Pattern found at index " << i << std::endl;
        }

        // Calculate hash value for next window of text
        if (i < n - m) {
            t = (d * (t - text[i] * h) + text[i + m]) % q;

            // We might get negative value of t, converting it to positive
            if (t < 0)
                t = (t + q);
        }
    }
}

1.2 Example Usage

Let’s demonstrate the usage of the Rabin-Karp algorithm with a simple example where we search for a pattern in a text.

C++
int main() {
    RabinKarp rk;
    std::string text = "EARN FOR IMPORVEMENT";
    std::string pattern = "EARN";
    int q = 101; // A prime number
    rk.search(pattern, text, q);
    return 0;
}

1.3 Output for Example 1

C++
Pattern found at index 0
Pattern found at index 10

2. Additional Examples

2.1 Example 2: Pattern in the Middle

In this example, we will search for a pattern that appears in the middle of the text.

C++
int main() {
    RabinKarp rk;
    std::string text = "AABAACAADAABAABA";
    std::string pattern = "AABA";
    int q = 101; // A prime number
    rk.search(pattern, text, q);
    return 0;
}

Output for Example 2

C++
Pattern found at index 0
Pattern found at index 9
Pattern found at index 12

2.2 Example 3: Pattern with Overlapping Occurrences

Here, we search for a pattern that has overlapping occurrences in the text.

C++
int main() {
    RabinKarp rk;
    std::string text = "AAAAA";
    std::string pattern = "AAA";
    int q = 101; // A prime number
    rk.search(pattern, text, q);
    return 0;
}

Output for Example 3

C++
Pattern found at index 0
Pattern found at index 1
Pattern found at index 2

Conclusion

The Rabin-Karp algorithm is a powerful tool for pattern matching, leveraging hashing to enable efficient and effective search capabilities. By converting patterns and substrings to hash values, it reduces the number of direct comparisons, enhancing performance