Parallel Reducts and Decision in Various Levels of Granularity
|School||Zhejiang Normal University|
|Course||Computer Software and Theory|
|Keywords||F-Rough Sets F-Attribute Significance Parallel Reducts F-Simplified Attribute Significance Rough Sets|
With the development of information technology, It is out of control for traditional database systems to deal with rapidly increasing massive data. These phenomena are called big data. With the advent of cloud computing, big data are having attracted widespread attention, how to effectively and efficiently mine and make use of big data has become new challenges for people in various fields. Big data is an inevitable trend in the development of data mining.Granular computing, a new theory in the field of artificial intelligence, is a powerful tool for mining big data and solving complex problems. The idea of granular computing has been widely used in machine learning and data mining etc.. The main models of granular computing are rough sets, fuzzy sets, quotient space and cloud model.As one of the main models for granular computing, rough sets can effectively and efficiently deal with imprecise, inconsistent and incomplete information and knowledge. However, most of rough set methods are proposed for static data. How do we analyze and deal with incremental data, massive data, dynamic data and multi-source data? This is not only a hotspot but also a difficulty in data mining. Before the concept of parallel reducts was proposed, the methods for rough sets to deal with incremental data, massive data and dynamic data mainly include:dynamic reducts and rule induction from multiple decision tables. The concept of parallel reducts is proposed by Dayong Deng, which injects new blood for data mining, and provides a new research area. It expands rough set theory from a single information system(or decision system) to multiple, and it also conforms to the human’s habits of solving problems. Parallel reduct theory fully embodies the idea of granular computing. In this paper, we investigate parallel reducts at various levels of granularity. The main work in this paper are as follows:(1) A new rough set model, called F-rough sets, is proposed, and the concept of parallel reducts under this model is redefined. The purpose of F-rough sets is to family of information subsystems and decision subsystems. It is the first rough set model that can deal with incremental data, massive data, dynamic data and multi-source data.(2) Two indexes, called F-attribute significance and F-simplified attribute significance in the algebra view, are defined. The two kinds of attribute significance are both proposed for a family of decision subsystems.F-attribute significance unifies the attribute significance in a single decision subsystem and multiple decision subsystems. When there is only one element in the family of decision subsystems, F-attribute significance is the attribute significance of its single element. F-simplified attribute significance simplifies computing process of F-attribute significance. We do not need to count the change of positive regions in each decision subsystem, but only to judge the change of positive regions.(3) The algorithms of parallel reducts with the above two indexes are proposed. Compared with the algorithm PRMAS (parallel reducts based on the matrix of attribute significance), these two algorithms have advantages of time complexity, space complexity, the length of reducts, time efficiency, accuracy ratio of classification and dynamic time efficiency. Moreover, the algorithm of parallel reducts based on F-simplified attribute significance has greatly improved in time efficiency.(4) We also investigate F-attribute significance in the information view. Its definition also unifies the attribute significance in a single decision subsystem and multiple decision subsystems. The algorithm frameworks (in (3)) in the algebra view can also be applied in the information view.(5) Three strategies of classification and decision with a family of decision subsystems are proposed. It is the first time that classifications and decisions are extended into a family of decision subsystems in rough set theory. The idea of three strategies of classifications and decisions is derived from classical statistics theory and machine learning. Maybe it could give some heuristic information to other researchers in the field of rough set theory.