Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

Research on New Technology in Data Mining Field

Author WangTong
Tutor HeZuoLian
School Tianjin University
Course Applied Computer Technology
Keywords User -dimensional Web log encoding Inverse - algorithm AprioriAll to Page dwell time Guidelines for the cross-entropy function Activation function Biometric
CLC TP311.13
Type PhD thesis
Year 2007
Downloads 1905
Quotes 1
Download Dissertation

With the development of Internet, online shopping, e-government, online information retrieval, have become increasingly frequent, huge power demand for network services network development. But in the face of the huge amount of data online, and many websites, people in the choice of network services, retrieving information often feel impossible to start, how to make network services to adapt to the individual needs of different users has become an urgent concern of the network service provider. To meet the individual needs of the user, the key question is how to find the user's access mode, one of the goals of the Web data mining is to find the user's access mode. Web data mining can be divided into three types, namely: Web usage mining, Web structure mining and Web content mining, mining frequent user access sequence is found that the main method of the user's access mode is Web usage mining is an important task . Web usage mining can discover knowledge from Web logs or visitor behavior, and can be found from different users access the intrinsic relationship between the different user behavior. Mining results can be used to improve the design of the Web site and the way we provide services to the user, so as to meet the needs of different users. In-depth study of the OLTP, OLAP database design features and Web log mining existing algorithms and their associated knowledge based on, original AprioriAll algorithm has been improved. User-dimensional slice through the Web log data in the Web log mining process, not only to all users as a whole to carry out excavation, but also the behavior of individuals of various independent mining, so that the excavated able to meet the needs of individual users to use. This improvement while achieving incremental Web log mining, dynamic Web log mining becomes possible. Experimental results show that the improved algorithm is better than the original algorithm to reduce the size of the candidate set in the mining process and the number of scanning the database, so that the time and space efficiency can be improved. For to take up a lot of memory and storage Web transactions in the mining process, as well as Apriori-like algorithm to generate a large number of candidate sets and the disadvantage of frequent scanning the database, this paper presents a Web the transaction coding techniques and inverse-Apriori algorithm. The Web transaction coding technology using a digital representation of a Web transaction, the Web transaction database compression to reduce memory usage; inverse-Apriori algorithm can be reversed to obtain the user's maximum frequent access sequences discovered association rules, and on this basis, Apriori-like algorithm is to avoid the cumbersome process of successive generate candidate frequent itemsets. By analyzing user browsing behavior and site response to user requests, the paper also proposed the dwell time for users to access Web Web log mining method. Dwell time reflects the user browsing behavior of users to access web pages of dwell time interval value is set in the mining, mining can be selected and reduced the scope of the mining to improve the interaction between the mining algorithms and user capability. A new algorithm based on this idea, first through the preprocessing of Web log, Web access with a dwell time record set, and then dwell time limit, build a dwell time of frequent access sequence tree to produce storage and compression with a dwell time of the database and records the number of web support. Finally, dwell time frequent access sequence tree for mining objects in the minimum support restrictions, the dwell time by a depth-first approach frequently accessed traversal sequence tree and found that the dwell time for users to access the site frequently accessed sequence , comparative experiments show that the algorithm is highly efficient Web log mining. The fuzzy neural network is another hot issue of data mining research field. In this paper, based on the maximum likelihood principle, derived cross-entropy function criterion for fuzzy neural network classification algorithm, at the same time build a new activation function. Based on the cross-entropy criterion and the new activation function fuzzy neural network classification algorithm based on BP algorithm compared to the sum of squared errors criteria, learn faster rate without causing instability of the learning process, not easy to fall into local minima. The advantages of the new activation function is that not only can take to 0,1 value, but also has the ability to be adjusted according to the total error function of the slope of the curve, accelerate the speed of convergence of the algorithm to improve the efficiency of the algorithm to improve the dynamic performance of the algorithm. Finally, the introduction of the idea of ??biological information technology solutions in Web Mining user identification problem, and proposed to build the iris recognition system based on hidden Markov model, which only requires the iris direction of the field as an input parameter, and the need to the many iris detail, compared to a conventional method, the noise and distortion of its iris image is not sensitive, so that the method has the characteristics of robustness; On the other hand, the matching method simplifies pretreatment process with high efficiency. By accurately identifying a user, to overcome the the stateless defects of existing Web system, can be realized on the Web log data according to user-dimensional \The individual behavior independently mining, so dig out the results to meet the needs of individual users to use. After the implementation of this vision, but also able to achieve incremental mining the Web, making it possible to dynamic Web log mining.

Related Dissertations
More Dissertations