Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Pattern Recognition and devices > Image recognition device

Research and Implementation of Exact String Matchiing Algorithms

Author ZhouJiangTao
Tutor FangBinXing
School Harbin Institute of Technology
Course Computer Science and Technology
Keywords string matching textsearch Bit-Parallelism automaton
CLC TP391.41
Type Master's thesis
Year 2008
Downloads 129
Quotes 0
Download Dissertation

String matching is an important basic research field in computer science and it occurs naturally as part of text processing, data compression, search engine, computational biology and network security. The type of pattern matching discussed in this paper is exact string matching. Topics involved:This paper first analyses the TextSearch module in Linux kernel, ameliorates its deficiencies, and extracts TextSearch from Linux kernel, and released as LibTextSearch, an independent third-party open source library. LibTextSearch runs in user space, and has more algorithms than TextSearch, provides users wider selection.We presented LNDM algorithm, which, based on BNDM, uses an overlap wide window to scan text. LNDM first scans backward for pattern prefix from middle window, then forward for suffix. Excavating maximum useful information of current window, LNDM loses less information when shifting window, and thus more efficient. Theoretical analysis shows LNDM has optimal time complexities in the worst (O(n)), best (O(n/m)) and average cases (O(n logσ(m)/m)). Experimental comparison of LNDM with existing algorithms validates our theoretical claims for searching short patterns on big alphabet.We also proposed a NBMA(Nondeterministic Boyer-Moore Automaton) algorithm, which combines BMA and Bit-Parallelism technique. BMA matches pattern with a BM automaton, whose number of states has an exponential proportion with the pattern size, and therefore seldom used. While using Bit-Parallelism to simulate a nondeterministic automaton, NBMA avoids the issue above. Also we experimented on NBMA to verify its time optimal.Finally, we presented an automated experiment platform for string matching, which, with a series of script toolkit to generate random text and patterns, provides highly flexible and configurable random string matching experiments. The Platform uses gporf to evaluate exact running time of individual algorithm, enables user to experiment on random pattern matching convieniently.

Related Dissertations
More Dissertations