Research and Application on Decision Tree in Data Mining
|Course||Applied Computer Technology|
|Keywords||KDD (Knowledge Discovery in Databases) DM (Data Mining) Decision tree Information Gain Information Entropy Attribute-value Pairs|
With the quick development of the database technique and the abroad uses of DBMA, people have had more and more data. There are abundance of knowledge in these huge data, although current database technique can do many functions with high efficiency, for example, do query or do statistic, it still cannot find the relationship and rule among data, it still cannot predict the development trend of the future in these data. There are amount of the data in the database, but there has little technique that can find out the knowledge with these data, so the current situation is that "too much data, too little knowledge".In this situation, there appears KDD (Knowledge Discovery in databases) and its core technique-DM (Data Mining). Decision tree algorithm is one of the core technique algorithm of DM, it is often used to predict models, and it can divide amount of data into different types purposefully, so that it can let others find out some valuable and potential information. In decision tree algorithm, the famous one is ID3 algorithm, which was presented by Quinlan in 1986. It is not a algorithm increasing by degrees, and it uses information entropy as a standard to select attribute, but the disadvantage of this algorithm is that it is easy to select those attributes whose values is more, while attributes whose values is more are not always the best. To solve this problem, we present a new approach on IDS algorithm-the information gain of attribute-value pairs in two levels-to optimize the decision tree.Comparing with the decision tree built by other algorithm with the same example, we can know that the tree built by the information gain of attribute-value pairs in two levels algorithm is better. We also took tests to compare our optimization algorithm with ID3 algorithm using the data set FAMn providing, and did experiment on the standard data UCI providing, the result show that the information gain of attribute-value pairs in two levels optimization algorithm is more excellent than IDS algorithm indeed.