The Research of Decoding Algorithm for Statistical Machine Tranlation
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||statistical machine translation k-best parsing decoding algorithm synchronous context-free grammar|
Along with the development of statistical machine translation (SMT), we have witnessed the model of SMT’s has experienced periods of word model, phrases model, formal syntax model, tree-to-string model string-to-tree model. At present some scholars are even trying to build a tree–to-tree model. So many complicated models there are, and such a variety of decoders for them.This paper mainly introduces a generalized decoding algorithm that is based on k-best parsing technology. We make little changes on different modes so that they can be expressed by synchronous context-free grammar (SCFG). Then we parse the source sentence with a monolingual k-best parsing algorithm. Since every rule of SCFG has two sides, the parse tree of the target language can be generated along with the parse work on source side synchronously. We merged a variety of features in our decoder with log-linear model. Scores of the SCFG rules can be gotten by summing up the nature logarithm of the feature values with weights. And the scores accumulate while the parse tree is generating in the parse work on the source sentence. So we can find the k-best derivations in the root vertex of the parse tree, which means we find the k-best translations of the source sentence.We also introduce a popular decoding algorithm for phrase model, which is based on finite automaton. We have made experiments on it and our generalized decoder, which turns out that they have equivalent ability on making translations on the same phrase model. Moreover, we have made experiments on several models with our generalized decoder, which turns out that the more prior knowledge we add into the model, the better translation we make.