For specific areas of research and application of statistical machine translation
|School||Kunming University of Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Statistical Machine Translation Medical field Areas of rule templates Dependent language model|
Machine translation is the field of natural language understanding difficult and hot, in today's increasingly frequent international exchanges, machine translation for multilingual communication is important, but its current rate of unsatisfactory accurate translation. However, for specific areas, especially in some of the more technical documentation professional terms, usually relatively fixed vocabulary, syntax is simple, and thus easier to get good results, such as weather, knowledge base and other fields. In this paper, for the field of statistical machine translation done a series of research and study to study specific medical field, mainly to obtain the results of the following aspects: the field of fusion rule templates statistical machine translation methods. Areas of the field of rule templates and resources related areas such as parallel corpus is to improve the system's total field-oriented machine translation system effects important foundation and an important tool. In this paper, the medical field as the research object, constructed for the medical field, statistical machine translation system rule base required fields and fields of resources, including the field of parallel corpus, domain rule templates. Field extension method proposed rule templates and template matching algorithm. And these proposed template matching algorithm and resource integration into the open field domain statistical machine translation system in order to achieve domain-oriented statistical machine translation system. Experiments show that in certain areas the size of the field of parallel corpus and rule templates with the support of domain-oriented statistical machine translation effects are more substantial upgrade. Building domain-oriented language model and dependency constraints decoding results. For the medical field to establish dependency language model, the parameters of the model proposed training methods and models into statistical machine translation decoding stage, the decoding translation results generated NBEST further constrain the candidate recalculated score translation adjustment NBEST candidate sequence to obtain The best translators translate better to enhance the correct rate. Experimental results show that the proposed final dependency syntactic relations based language model can be improved to some extent Chinese - English translation Statistical Machine Translation optimum accuracy. Using the above findings, the use of lexical, syntactic analysis, based on open source software such as word alignment, combined with field dictionaries, domain templates and other fields resources, build the medical field oriented statistical machine translation prototype system.