Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

The Study of Text Classification and Retrieval for Chinese Patent

Author DanShiLei
Tutor FengLin
School Dalian University of Technology
Course Applied Computer Technology
Keywords Patent Classification Patent Search Semantic disambiguation Manifold of dimension reduction Index pool
CLC TP391.1
Type Master's thesis
Year 2011
Downloads 31
Quotes 0
Download Dissertation

In recent years, the patent knowledge to get the full attention of the people, patent analysis and mining become a hot research topic. The development of machine learning techniques provide a favorable technical support for patent mining, patent classification and retrieval of patent knowledge mining, also a product of innovative design tools. Patent data has a unique structure, strong professional patent classification mechanically strong, high data dimension, lower classification rate; patent search large volumes of data, retrieval efficiency is low, higher threshold of specialization. To solve these problems, the paper mainly research-oriented field of patents Chinese text classification and retrieval methods, designed to improve patent classification and retrieval efficiency, further mining patent knowledge. Problems facing patent mining propose a dimensionality reduction based on semantic disambiguation and manifold patent classification and the patent retrieval model based on dynamic indexing pool, and be based on engineering semantic network theory solve the problem of multi-conflict innovative design, high dimensional time sequence mining patent data the deep excavation auxiliary innovative design, which will help Innovation Knowledge traction. Semantic disambiguation and stream-shaped drop dimension of patent classification method, mainly for machinery sub-word, the feature item extraction machinery sexual feature items not reflect the patent data deep semantic knowledge, etc., through the introduction of a semantic dictionary, features the word disambiguation process, reduce the noise in the feature items, on the other hand is also relatively reduces the dimension of the text vector. Dimension is too high is the text categorization facing another problem by introducing the manifold learning algorithms, on the one hand, to find the intrinsic dimension of patent data, on the other hand by the dimensionality reduction to improve the classification efficiency last two strategies is verified by experiment can effectively improve the retrieval efficiency. Numerous studies show that through the multi-index technology can effectively improve the retrieval efficiency, but different studies for specific retrieval applications (multi-language text, image, video, data, etc.) for each multi-index strategy, effectively improve the retrieval efficiency, but these The strategy has some limitations, not traction to other application areas, there is no research shows the index maintenance and management strategy. This paper presents an application-oriented dynamic index pool model, solve these problems and gives a theoretical basis for index construction and optimization, the index pool model is the pool of technology applied to the multi-index management, according to the user's query feedback constantly optimize the index structure, to provide users with a more efficient retrieval services, on the other hand can also reduce the load on the system. And through patent search experiments to verify the validity of the index pool retrieval model. Patent classification and retrieval research based on engineering semantic web, this paper proposes to solve the multiple conflicts in the innovative design, innovative design process is to solve the conflict in the application in a real-life production, the use of engineering semantic network of more than engineering semantic level design issues of conflict understanding and analysis of the problem itself solve the problem of multi-conflict design; oriented the massive patent information, the use of high-dimensional time series data mining methods to analyze the distribution of patent law to achieve the invention cross-cutting between the principle and patented instance, systematic, multi-modal matching.

Related Dissertations
More Dissertations