Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

WEB-based mining bilingual study on access to technology

Author ZhouHui
Tutor HeZuoLian
School Tianjin University
Course Applied Computer Technology
Keywords Bilingual on access Phrase alignment Web mining technologies Dictionary fields automatically expand
CLC TP391.1
Type Master's thesis
Year 2009
Downloads 26
Quotes 0
Download Dissertation

Internet text data in various languages ​​geometrically increasing these text data naturally become the natural language processing research a valuable resource . This paper begins with internet access to language resources of interest , and then get on the Internet to further bilingual pairs. Bilingual phrase alignment in the field of machine translation is important . In this paper, tagging , the precursor for its category words words words and subsequent rules when paired to meet different situations, using a combination of statistical methods and rules to set the priority of the rules , in order to determine the current parts of speech , improved speech Note the correct rate . In Phrasing in the Chinese Phrasing probabilistic , combined with the high rate of some of the rules into words , by finding the shortest path method N- Phrasing . This method has been for a word of the sentence, the sentence according to the phrase library to find all the possible phrases and construct a directed acyclic graph , get the optimal path , thereby improving Phrasing the correct rate . This paper also uses automatically get from the search engine phrase co-occurrence frequency number of the network approach to phrase alignment. The method utilizes tagging and phrases segmentation results, use the network to determine the number of co-occurrence frequency of two mutually English phrases are translated , and then select the best candidates according to the greedy rule . This method is not able to obtain adequate coverage bilingual corpus part of the new phrase , as a bilingual corpus supplement, experiments show that this method is effective to improve phrase alignment precision and recall rate . On the other hand , this paper based on Web mining technology, using iterative strategy to achieve Bilingual right to acquire, in order to achieve the automatic expansion of domain dictionary . This method is limited to a computer professional dictionaries in English and Chinese explanations combined into phrases in English tuple submitted to search engines, various fragments from the returned (snippets) extracted similar tuple in English , select Confidence a high -tuple , re-submit the same to the search engine operation , multiple iterations until the computer is specialized dictionaries for each tuple is processed. Experiments show that the method used to expand the corpus when it can effectively improve the dictionary to obtain the correct rate , but also can improve the efficiency of a bilingual dictionary compilation .

Related Dissertations
More Dissertations