Introduction
In the realm of computer science, pattern matching is a crucial task with applications in text processing, DNA sequencing, and search engines. One efficient algorithm for pattern matching is the Knuth-Morris-Pratt (KMP) algorithm. Named after its inventors, the KMP algorithm optimizes the search process by preprocessing the pattern to determine the longest prefix that is also a suffix (LPS). This reduces the number of comparisons during the actual search phase, making it significantly faster than naive methods. This article will guide you through the implementation of the KMP algorithm in C++ with real examples to illustrate its practical application and output.
Prerequisites
Before diving into the implementation, it’s beneficial to have:
- Basic understanding of algorithms: Knowledge of string matching and basic algorithm principles.
- Familiarity with C++ programming: Proficiency in C++ syntax, arrays, and functions.
- Understanding of data structures: Specifically, arrays and their manipulation.
KMP Algorithm Structure
Definition and Formula
The KMP algorithm preprocesses the pattern to create an array called the Longest Prefix Suffix (LPS) array. This array is used to skip characters during the matching phase, thus avoiding unnecessary comparisons.
LPS Array Calculation
For a pattern of length m
, the LPS array is calculated as follows:
LPS[i]
stores the length of the longest proper prefix which is also a suffix for the substring pattern[0…i].
1. Implementing the KMP Algorithm
1.1 KMP Class Definition
Below is the class definition for the KMP algorithm, including the methods for preprocessing the pattern and searching within the text.
#include <iostream>
#include <vector>
#include <string>
class KMP {
public:
std::vector<int> computeLPSArray(const std::string& pattern);
void KMPSearch(const std::string& pattern, const std::string& text);
};
std::vector<int> KMP::computeLPSArray(const std::string& pattern) {
int m = pattern.length();
std::vector<int> lps(m, 0);
int len = 0;
int i = 1;
while (i < m) {
if (pattern[i] == pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
void KMP::KMPSearch(const std::string& pattern, const std::string& text) {
int m = pattern.length();
int n = text.length();
std::vector<int> lps = computeLPSArray(pattern);
int i = 0; // index for text
int j = 0; // index for pattern
while (i < n) {
if (pattern[j] == text[i]) {
i++;
j++;
}
if (j == m) {
std::cout << "Found pattern at index " << i - j << std::endl;
j = lps[j - 1];
} else if (i < n && pattern[j] != text[i]) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
}
1.2 Example Usage
Let’s demonstrate the usage of the KMP algorithm with a simple example where we search for a pattern in a text.
int main() {
KMP kmp;
std::string text = "ABABDABACDABABCABAB";
std::string pattern = "ABABCABAB";
kmp.KMPSearch(pattern, text);
return 0;
}
1.3 Output for Example 1
Found pattern at index 10
2. Additional Examples
2.1 Example 2: Pattern at the Beginning
In this example, we will search for a pattern that appears at the beginning of the text.
int main() {
KMP kmp;
std::string text = "HELLOHELLOHELLO";
std::string pattern = "HELLO";
kmp.KMPSearch(pattern, text);
return 0;
}
Output for Example 2
Found pattern at index 0
Found pattern at index 5
Found pattern at index 10
2.2 Example 3: Pattern with Overlapping Occurrences
Here, we search for a pattern that has overlapping occurrences in the text.
int main() {
KMP kmp;
std::string text = "AAAAA";
std::string pattern = "AAA";
kmp.KMPSearch(pattern, text);
return 0;
}
Output for Example 3
Found pattern at index 0
Found pattern at index 1
Found pattern at index 2
Conclusion
The KMP algorithm is a powerful tool for pattern matching, offering efficient and optimized search capabilities. By preprocessing the pattern to create the LPS array, it minimizes unnecessary comparisons, making it faster than naive search methods. Through the provided examples, we’ve seen how the KMP algorithm can be implemented in C++ and applied to various text searching scenarios