Research and Application on the Data Mining Algorithm Based on Decision Tree
|School||Wuhan University of Technology|
|Course||Applied Computer Technology|
|Keywords||Data mining Decision Tree ID3 Over-Fitting Weighted Simplification Entropy|
Data Mining means the process of extracting cryptic and potential helpful information from a mass of Data. It is one kind of brand new Data analysis technology and popular in the field of banking finance, insurance, government, education, transportation and national defense etc.Data classification is one of important contents in Data Mining. There are many methods for Data classification, such as decision tree induction, association rule, classification technique, Bayesian classification and Bayesian belief networks, genetic algorithms, neural networks, rough sets, and so on. The Decision Tree classification algorithm bases on the instances amongst these is widely used with its advantages of convenience for getting apparent rules, smaller calculation workload, showing important decision characteristics, higher classification correctness etc. Decision Tree algorithm is currently one of the most popular in Data Mining algorithms according to related statistics.There are some issues in the most existent decision tree algorithms, while applied to the reality tasks, namely multi-value bias, lower efficiently in computation etc. Therefore, it possesses important theoretic and factual significance to make further improvement and raise the performance for decision tree, so as to make decision tree more suitable for the requirement of the factual applicationThis article deeply makes researches aiming at the above-related Database knowledge discovery issues, and the purpose is to probe into optimization and combination of Decision Tree in Data Mining, in order to be applied to the reality tasks. The involved contents exist as follow:Firstly, this paper introduces the basic theory of Data Mining and classification technology macroscopically, analyses and comparisons of decision tree algorithms were especially emphasized on, for example, ID3, C4.5, CART, etc.Secondly, several common problems in the classification process such as the vacancy of attribute value, deal with continuous attribute and decision tree over-fit the training data set, were analyzed in detail by using the method of decision tree. All of these problems would reduce the accuracy of classification. So, in order to enhance the accuracy of classification, a reasonable strategy must be used in the process of constructing decision tree.Thirdly, the algorithm of decision tree is optimized in this paper, the method of resolving the problems of the vacancy of attribute value, multi-value bias, the principle of selecting attribute was put forward. The concept of weighted simplification entropy was put forward creatively. A new algorithm based on ID3 was proposed in the paper. Compared with the algorithm of ID3 which used widely, the improved algorithm has a good performance.Finally, decision tree algorithm is used for data mining in the device manage system of a textile mill. It provides a scientific and accurate basis for decision support.