⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme.txt

📁 BM匹配算法汇编实现
💻 TXT
字号:
BM.LIB

The BM library contains 3 Boyer Moore exact pattern matching algorithms
written in 32 bit Microsoft assembler (MASM).

BM.ASM is reasonably close to the original algorithm design from Robert
Boyer and L.Moore published in 1977. It uses the three heuristics of a
BAD CHARACTER shift, a GOOD SUFFIX shift and the additional heuristic to
calculate the reverse offset of the mismatching character.

There are two variants included which utilise parts of the original design,

SBM.ASM is a simplified version that uses the GOOD SUFFIX shift.

BMH.ASM is similar to a Horsepool variation and uses the BAD CHARACTER
shift.

The three variations have been extensively tested across both Intel and AMD
processors and the results show some hardware differences between the two
processors. The BMH.ASM version is the fastest on an Intel PIII, the SBM
is the fastest on the AMD and the original BM.ASM averages better across
the two hardware platforms.

The general characteristic of all three algorithms apart from their speed
is fast recovery from partial matches and a threshold in efficent pattern
size of under 5 characters in length. This has been acheived by coding
the algorithms with very short instruction paths in the comparisons,
retrieval of the character from the table and the shift calculations. 

The maximum variation between the three algorithms is about 10% and it
reflects the general differences between the two processors. The AMD has
a shorter pipeline and a lower penalty for register stalls where the Intel
has better branch prediction and a lower penalty for mispredicted jumps.

It is reasonable to expect that on early processors or on later processors
that are not available as of August 2001 that the performance range may
vary between these algorithms. While they are designed for later processors,
the three algorithms will run on a 486.

USAGE

The three algorithms are well suited for pattern matching tasks as diverse
as recursively searching files for virus pattern, keys in very large
databases, text and word processing find and replace operations and any
other requirement where large bodies of data are being searched for exact
patterns of characters of BYTE data.

LIBRARY

The library is in standard Microsoft format and can be used by both MASM
and Visual C++. The include file is in MASM format and to use the 3
library modules in C/C++, the correct prototypes must be written. The
parameters are 32 bit unsigned long integers.

The parameters are,

    startpos    zero based offset from the start of the source
    lpSource    address of the source to be searched
    srcLngth    length of the source to be searched
    lpSubStr    address of the pattern to search for
    subLngth    length of the pattern being searched for

    The return value in EAX is either the zero based offset of the match
    or -1 for no match. -2 is returned if the pattern is under 2 characters
    in length.

SOURCE

The source is included for both research and practical applications that
require custom libraries.

COPYRIGHT

The three modules and the library are copyright freeware, they may be used
by and person free of any charge or royalty whatsoever in any program they
wish to write. They can be freely used in commercial programs without any
form of restriction.

The only restriction is that the library and the assembler modules included
to build it cannot be sold or included in any commercial package.

Steve Hutchesson
Sydney
Australia
August 2001

hutch@pbq.com.au

POSTSCRIPT

If you find a way of coding a Boyer Moore exact pattern algorithm that
benchmarks faster than the ones included here, please feel free too send
it to me.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -