A Study of Constructing Rules of Phrases in Contemporary Chinese for Chinese Information Processing
|Keywords||phrase structure grammar syntactic category semantic category rule unification generalized valence mode Chinese Information Processing|
This thesis, which is oriented towards Chinese Information Processing, or CIP by computer, proposes a set of formulized rules on Chinese phrase structures, and discusses the treatment of the phrase structure disambiguation. The full text consists of 7 chapters.Chapter one: The status of development of CIP and the current level of study on Modern Chinese grammar are discussed preliminarily in a broad outline. Based on it, the system of Chinese phrase structures is chosen as the subject of this research, and the goal of the paper is set to create a RuleBASE including a set of Chinese phrase structure rules with rich constraints. It is worth noting that such a RuleBASE must be supported by a lexicon, which contains vast amount of syntactic and semantic features related to every lexical entry. Fortunately, such an electronic lexicon has been developed by the Institute of Computational Linguistics of Peking University. To some extent, the main research presented in this thesis can be regarded as a natural extend of the research of the lexicon.Chapter two: A classification system of Chinese phrase is put forward firstly, and other syntactic categories for description of phrase structures are also defined. All of these syntactic attributes, which are based on the theory of Phrase-standard Grammar system proposed by Prof. Dexi Zhu, will be used for describing the functions of phrases. At the same time, a semantic expression framework, named as Generalized Valence Mode, is designed for describing the semantic features of a word or a phrase. Furthermore, a simple semantic taxonomy of Chinese content words is also built up. Apparently, all of those syntactic and semantic categories set up in this chapter are the basis of constructing phrase structure rules of Modern Chinese that will be discussed in detail in the Chapter three.Chapter three: With the existent syntactic and semantic categories, a constraint-based Chinese phrase-structure rule system is constructed. Each rule includes two parts: a context free rewrite rule and a series of unification equations. The former is used for describing the construction of a compounded phrase, and the latter is used for describing the functions of the compounded phrase and the constraints of the constituents,which generally consist of syntactic constraints and semantic constraints. As a result, there are 89 rules induced in this chapter for most types of phrases in Modern Chinese, i.e. np, ap, vp, dj.Chapter four: This chapter analyzes the ambiguity of determining boundaries and structural relations of Chinese phrases in automatic parsing by computer. Seen from different perspectives, all of the ambiguous phrases can be classified into different types. In terms of components of ambiguous structures, ambiguous phrases can be classified into two categories: one including terminal symbols, the other not including terminal symbols but only non-terminal symbols. In terms of the influence of ambiguity, ambiguous phrases can also be classified into two categories: self-confined ambiguous phrases and non-self-confined ambiguous phrases. The influence of the former ambiguity is mainly inside the ambiguous phrases. The influence of the latter ambiguity is outside of the ambiguous phrases. As viewed from differentiated types of the relation between type and token, ambiguous phrases can be classified into three categories: the true-ambiguous phrase, the quasi-ambiguous phrase, and the pseudo-ambiguous phrase. Depending on the above analysis and the set of rules proposed in the chapter three, I also survey all ambiguous phrases in Modern Chinese and their various types of ambiguity.