Research on Writer Recognition Based on Chinese Handwriting Documents and Information Fusion
|Course||Signal and Information Processing|
|Keywords||Writer Recognition Information Fusion Multiple Classifier Combination Text-independent Text-dependent Data Mining|
Writer Recognition (WR) by computer has been a difficult problem all the time in the field of pattern recognition. Though many approaches have been excogitated in the more than 40 years’ research, no one has reached the precision and reliability for the practical application. Every method has its own advantages and disadvantages, and different range of application as well. It’ll be the trend and direction of WR to use the technology of information fusion to combine different methods organically in order to establish an efficient and stable WR system.In this dissertation, a solution of WR system based on information fusion has been attained using a multi-level and multi-method fusion aimed at characteristics of Chinese handwritten documents. Meanwhile, according to the requirements of practical application, a software has been developed to realize the WR system, which integrates the latest technology of software-development, network and database. The research results are as follows:1. Practical methods for pre-processing in WR have been presented. It’s the basis of achieving high accuracy rate in WR to get clean and normalized handwriting images. The dissertation conducts a deep study and solves the problems of removing background noise as dots, lines etc. in handwriting images as well as in the heavily polluted handwriting images, the extraction and normalization of the sample characters and so on. It also gets practical preprocessing technology of handwriting images, which is proved reliable by the application of processing thousands of handwriting images.2. A new method of handwriting texture recognition has been presented based on stable frequency feature. This method constructs multiple texture pictures from one handwriting image in the same way as constructing texture from multiple single-character images’ random collocation, then extracts the frequency features of each texture picture by the introduction of Fast Fourier Transform. After fusion stable frequency features and the estimation of distance would be available. The last step is to classify and recognize. This method eliminates the effects of eigenvalue’s random fluctuation caused by different content and different position. The experiments indicate that the new method can increase the identification rate significantly in the handwriting database with great many samples.3. The decompounded model of WR affecting factors is built for the first time. In the thesis, the factors which affect the feature distance are classified into two categories: the difference of handwriting style(affected by the writer, called Writing Factor for short) and the difference of characters’structure(affected by the content, called Character Factor for short). Writing Factor is the foundation for WR, while Character Factor is a minus factor for a higher accuracy rate. The thesis demonstrates the obvious difference between these two factors by means of variance analysis. And firstly the two-factor decompounded model is built.4. A text-independent WR classifier based on the text-dependent method is proposed for the first time. According to the two-factor decompounded model, new information is got through Data Mining Technology and Character Factor, the minus factor is separated from the feature distance. This classifier simplifies the process of identification and obtains remarkable increase of accuracy rate in the large handwriting data-base using a single recognition method.5. A plan based on combination of multiple classifiers is brought forward. The multi-classifier composed of several WR methods increases the accuracy rate in application and solves the practical problems perfectly.6. A few-character WR method based on information fusion is presented for the first time. Since the characters in the test document are few, there’s almost no same character between the samples. The thesis weighed the text-dependent handwriting features through Data Mining information and gained the features close to text-independent, then used the text-independent classifier to get information fusion to classify and identify. This method provided a solution for few-character WR.7. WR software systems are designed and implemented, which are applied and popularized successfully. Through the research on theory and the innovation on technology, three software systems related to WR have been designed and implemented, which are Pre-processing For Character Images Software System, Automatic Recognition and Search For Network-based Handwriting Software System and Automatic Office File-checking Software System respectively. These three systems are applied and promoted in many cities such as Wuhan, Nanjing, Shanghai, Beijing, Guangzhou, Neimenggu, Hainan, Jilin, Liaoning, Tianjin and so on. A lot of criminal cases related to handwriting have been uncovered with the help of the software systems, which produces great economic and social benefit. Moreover, Automatic Office File-checking Software System won the third prize of Scientific and Technological Achievements in the Ministry of Public Security.