I am trying to solve the problem Shortest Matching Substring using the KMP algorithm, but my code is not returning the correct results.
Approach:
- I split the pattern
p
into three parts based on the two'*'
characters. - I use KMP (Knuth-Morris-Pratt algorithm) to find occurrences of these three parts in
s
. - Then, I attempt to find the shortest substring in
s
that contains all three parts in the correct order.
Issue:
- The function is not always returning the correct answer.
- Sometimes, it returns
-1
when a valid substring exists. - I suspect there might be an issue with how I process the indices or combine results from KMP.
Can someone help me identify what's wrong with my approach?
Code:
class Solution {
public:
void calculateLPS(string& needle, vector<int>& lps) {
int m = needle.size();
lps[0] = 0;
int len = 0;
int i = 1;
while (i < m) {
if (needle[i] == needle[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
}
vector<int> kmp(string& haystack, string& needle) {
int m = needle.size();
int n = haystack.size();
if (m > n) return {};
vector<int> lps(m, 0);
calculateLPS(needle, lps);
int i = 0, j = 0;
vector<int> ans;
while (i < n) {
if (haystack[i] == needle[j]) {
i++;
j++;
}
if (j == m) {
ans.push_back(i - j);
j = lps[j - 1];
} else if (i < n && haystack[i] != needle[j]) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return ans;
}
int shortestMatchingSubstring(string s, string p) {
int j = 0;
vector<string> strs(3, "");
for (int i = 0; i < p.size(); i++) {
if (p[i] != '*') {
strs[j].push_back(p[i]);
} else if (p[i] == '*') {
j++;
}
}
vector<int> startIdx = kmp(s, strs[0]);
vector<int> midIdx = kmp(s, strs[1]);
vector<int> endIdx = kmp(s, strs[2]);
if (startIdx.empty() || endIdx.empty()) return -1;
int minLength = INT_MAX;
for (int start : startIdx) {
for (int mid : midIdx) {
if (mid > start) {
for (int end : endIdx) {
if (end > mid) {
minLength = min(minLength, end + (int)strs[2].size() - start);
break;
}
}
break;
}
}
}
return (minLength == INT_MAX) ? -1 : minLength;
}
};
Expected Behavior:
- The function should return the length of the shortest substring in
s
that matchesp
considering'*'
as a wildcard. - If no such substring exists, it should return
-1
.
Example Test Case:
string s = "abaacbaecebce";
string p = "ba*c*ce";
Solution sol;
cout << sol.shortestMatchingSubstring(s, p); // Expected Output: 8
Where Could the Issue Be?
- Is my KMP implementation correct?
- Am I handling the indices properly when checking for the shortest substring?
- Are there any edge cases I am missing?