0%

模式匹配Sunday算法(1)

简介

Sunday算法是Boyer-Moore算法的改进,效率有略微的提升:
在匹配失败时,处理文本串中参加匹配的最末位字符的下一位字符;
如果该字符没有在模式串中出现,则直接跳过,即右移位数=匹配串长度+1;
如果该字符在模式串中出现过,则右移位数=模式串中最右端的该字符到末尾的距离+1;
平均性能的时间复杂度为O(n),最差情况的时间复杂度为O(n*m)。

示例代码

如下为匹配单字节字符示例,而匹配宽字符时,除函数声明外还有字符集大小不同

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
char* SundaySearchA(char* txt, int tn, char* pat, short pn)
//wchar_t* SundaySearchW(wchar_t* txt, int tn, wchar_t* pat, short pn)
{
if (!txt || !pat || (pn <= 0) || (pn > tn)) return 0;
const int shift_size = 0x100; // 单字节字符集大小
//const int shift_size = 0x10000; // 宽字符集大小
int* shift = new int[shift_size];
for (int i = 0; i < shift_size; i++)
{
shift[i] = pn + 1;
}
for (short i = 0; i < pn; i++)
{
shift[pat[i]] = pn - i;
}
for (int i = 0; i < (tn - pn); i += shift[txt[i + pn]])
{
short j;
for (j = 0; j < pn; j++)
{
if (pat[j] != txt[i + j]) break;
}
if (j == pn)
{
delete[] shift;
return (txt + i);
}
}
delete[] shift;
return 0;
}

搜索字节码示例(暂不支持通配符)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
unsigned char* SundaySearchByte(unsigned char* dat, int dn, unsigned char* pat, short pn)
{
if (!dat || !pat || (pn <= 0) || (pn > dn)) return 0;
const int shift_size = 0x100; // 单字节集大小
int* shift = new int[shift_size];
__try
{
for (int i = 0; i < shift_size; i++)
{
shift[i] = pn + 1;
}
for (short i = 0; i < pn; i++)
{
shift[pat[i]] = pn - i;
}
for (int i = 0; i < (dn - pn); i += shift[dat[i + pn]])
{
short j;
for (j = 0; j < pn; j++)
{
if (pat[j] != dat[i + j]) break;
}
if (j == pn)
{
delete[] shift;
return (dat + i);
}
}
}
__except (1) // EXCEPTION_EXECUTE_HANDLER
{
}
delete[] shift;
return 0;
}