Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Retrieval machine

Research of Chinese Word Segmentation and Page Rank about Search Engine

Author WangZheng
Tutor WangXiGang
School Liaoning University of Science and Technology
Course Computer network
Keywords Forward maximum matching Reverse maximummatching Pagerank VSM
CLC TP391.3
Type Master's thesis
Year 2012
Downloads 18
Quotes 0
Download Dissertation

With the development of the Internet, the demand for information isincreasing, but in the vastness of the Internet information above to quicklyand accurately find the information they need, then it is not an easy thing todo, so we have to rely on search engine tools. User through the searchengines can quickly find the information in the chaotic. But a good searchengine which can find fast and accurately user information.Search engine search efficiency mainly rely on two aspects, one is theefficiency of the word, the other search return page sort. This articleresearchs the segmentation and page sort of search process two aspects.Theview of the advantages and disadvantages of forward maximumsegmentation algorithm and reverse maximum segmentation algorithm, dueto relatively complex of the chinese word segmentation, this article firstproposes the forward largest and reverse the biggest match of the two-waymatching algorithm, two-way matching improve the accuracy of thesegmentation algorithm to some extent. Page sorting is an important factor toaffect the user search efficiency, web pages related degrees and web of linksare important factors which directly impact on the web weights proposed aweb-based pagerank algorithm. New page sorting algorithm both to combatthe possibility of drift of the page and also prevention sort results of all relyon the web page.Message boards, this paper designs a search function registered user canpost, reply to messages.The most important of what is the search for theirown interest message function, the user enters the keyword, message recordscan be found to they are interested in. The whole system which relys on SSHarchitecture, Lucene and page sorting algorithm which is proposed in thispaper to sort the search results page sort. Experimental results verify thesuperiority of the page sorting algorithm proposed in this paper. The paperalso analyzes the word based on two-way forward the largest and reversemaximum matching match strengthen to some extent on the accuracy of thesegmentation, but the Chinese situation is more complex, two-way matching of some ambiguous expressions can not be accurately remove separation ofthem. Some sentences must rely on semantic word to word accuracy, but thesemantic segmentation process is complex, difficult.

Related Dissertations
More Dissertations