Understanding and Implementing the Knuth-Morris-Pratt (KMP) Algorithm
The Knuth-Morris-Pratt (KMP) algorithm is an efficient string-matching algorithm that improves upon the naive approach of substring searching. It's particularly useful when you need to find occurrences of a pattern within a larger text, especially when the pattern might appear multiple times or when the text is very long.
How KMP Works
The key idea behind KMP is to utilize the information gained from previous match attempts to skip unnecessary comparisons. It does this by preprocessing the pattern to create a "failure function" or "partial match table". This table helps the algorithm to decide how many characters to skip when a mismatch occurs.
Example
Let's consider a simple example to understand how KMP works:
Text: "ABABDABACDABABCABAB" Pattern: "ABABCABAB"
The KMP algorithm would preprocess the pattern "ABABCABAB" to create a partial match table. Then, it would use this table to efficiently find the pattern in the text, avoiding unnecessary comparisons.
Step 1:
ABABDABACDABABCABAB
ABABCABAB
^
Step 2:
ABABDABACDABABCABAB
ABABCABAB
^
Step 3:
ABABDABACDABABCABAB
ABABCABAB
^
Step 4:
ABABDABACDABABCABAB
ABABCABAB
^
Step 5:
ABABDABACDABABCABAB
ABABCABAB
^
Step 6: (Mismatch, shift pattern)
ABABDABACDABABCABAB
ABABCABAB
^
Step 7:
ABABDABACDABABCABAB
ABABCABAB
^
Step 8:
ABABDABACDABABCABAB
ABABCABAB
^
Step 9: (Mismatch, shift pattern)
ABABDABACDABABCABAB
ABABCABAB
^
Step 10:
ABABDABACDABABCABAB
ABABCABAB
^
Step 11: (Mismatch, shift pattern)
ABABDABACDABABCABAB
ABABCABAB
^
Step 12:
ABABDABACDABABCABAB
ABABCABAB
^
Step 13: (Mismatch, shift pattern)
ABABDABACDABABCABAB
ABABCABAB
^
Step 14:
ABABDABACDABABCABAB
ABABCABAB
^
Step 15:
ABABDABACDABABCABAB
ABABCABAB
^
Step 16:
ABABDABACDABABCABAB
ABABCABAB
^
Step 17:
ABABDABACDABABCABAB
ABABCABAB
^
Step 18:
ABABDABACDABABCABAB
ABABCABAB
^
Step 19: (Match found!)
ABABDABACDABABCABAB
ABABCABAB
^
Match found at index 10Explanation:
The algorithm starts by comparing characters from left to right.
When a mismatch occurs (Step 5), the pattern shifts based on the pre-computed LPS (Longest Proper Prefix which is also Suffix) array.
This process continues, shifting the pattern efficiently to avoid unnecessary comparisons.
Finally, a complete match is found at index 10 (Step 19).
Time and Space Complexity
Time Complexity
The KMP algorithm has two main phases:
Preprocessing phase: O(m)
Where m is the length of the pattern
This phase computes the partial match table (LPS array)
Searching phase: O(n)
Where n is the length of the text
This phase performs the actual search
Therefore, the overall time complexity of KMP is O(m + n).
Compared to the naive string matching algorithm, which has a worst-case time complexity of O(mn), KMP is significantly more efficient, especially for large texts or when the pattern appears multiple times.
Space Complexity
The space complexity of KMP is O(m), where m is the length of the pattern. This space is used to store the partial match table (LPS array).
The space complexity is independent of the length of the text being searched, which makes KMP memory-efficient even for very large texts.
C++ Implementation
Here's a C++ template class that implements the KMP algorithm:
#include <vector>
#include <string>
template<typename T>
class KMPMatcher {
private:
std::vector<T> pattern;
std::vector<int> lps;
void computeLPSArray() {
int len = 0;
int i = 1;
lps[0] = 0;
while (i < pattern.size()) {
if (pattern[i] == pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
}
public:
KMPMatcher(const std::vector<T>& pat) : pattern(pat), lps(pat.size()) {
computeLPSArray();
}
std::vector<int> search(const std::vector<T>& text) {
std::vector<int> matches;
int i = 0; // index for text[]
int j = 0; // index for pattern[]
while (i < text.size()) {
if (pattern[j] == text[i]) {
j++;
i++;
}
if (j == pattern.size()) {
matches.push_back(i - j);
j = lps[j - 1];
} else if (i < text.size() && pattern[j] != text[i]) {
if (j != 0)
j = lps[j - 1];
else
i = i + 1;
}
}
return matches;
}
};Using the KMPMatcher Class
Here's how you can use the KMPMatcher class:
#include <iostream>
#include <vector>
#include <string>
int main() {
std::string text = "ABABDABACDABABCABAB";
std::string pattern = "ABABCABAB";
std::vector<char> text_vec(text.begin(), text.end());
std::vector<char> pattern_vec(pattern.begin(), pattern.end());
KMPMatcher<char> kmp(pattern_vec);
std::vector<int> matches = kmp.search(text_vec);
std::cout << "Pattern found at indices: ";
for (int index : matches) {
std::cout << index << " ";
}
std::cout << std::endl;
return 0;
}This code will output:
Pattern found at indices: 10Conclusion
The KMP algorithm is a powerful tool for efficient string matching. By preprocessing the pattern and using the information gained from previous comparisons, it significantly reduces the number of comparisons needed in many cases, especially when dealing with repetitive patterns or long texts.
The time complexity of O(m + n) makes it much more efficient than naive string matching algorithms, particularly for large inputs. Its space complexity of O(m) ensures that it remains memory-efficient even when searching through very large texts.
The template class provided here allows you to use KMP with any type that supports equality comparison, not just characters. This makes it versatile for various applications beyond just string matching.

