Research and Implementation of Production Systems for Massive Rules
|School||Zhejiang University of Technology|
|Course||Computer Software and Theory|
|Keywords||distributed rule matching MapReduce Rete multiple rule firing|
Business rules developed from production rules and rule engines emerged in industrialization of production systems, have been used by more and more enterprises to describe and maintain business knowledge, construct enterprise information systems. Rule engine technologies can separate business rules from business logics in applications, and decouple the frequent changes of business rules from information systems, which greatly enhance the intelligence, robustness and adaptability of application systems, including expert systems and EISs. With the expansion of the extent and complexity of domain problems, the scale of knowledge in knowledge base of expert systems and rule base of EISs becomes more and more large; its structure has also become more complex.In order to meet the processing challenge of massive rules corresponding to the massive knowledge, it needs production systems can provide faster and more accurate matching and processing abilities. The ideas of distributed computing and parallel computing are used to construct more efficient production system architecture, combined with a more efficient algorithm to speed up pattern matching, which is an inevitable trend to efficiently process massive rules. This paper analyzes all kinds of existing promotion research for production systems, proposes a distributed and parallel matching approach for massive rules based on MapReduce programming model and improved Rete algorithm, realizes multiple rule firing through interference analysis method, and develops production systems prototype for massive rules.Main contributions in this thesis include the following aspects:Firstly, a double hash filter approach for alpha network, combined with beta node indexing, is proposed to improve Rete algorithm, with the purpose of speeding up pattern matching process for each computing unit in cluster environment. Introducing fact type node to Rete network, a hash map about’fact type-fact type node’is built in root node, and hash maps about’attribute constraints-alpha node’are also constructed in fact type node. This kind of double hash mechanism is used to speed up the filtration of facts in alpha network. Meanwhile, hash tables with the indexes calculated through fact objects, are built in memories of beta nodes, to avoid unnecessary iteration in the join operations of beta node.Secondly, a distributed rule matching method based on MapReduce programming model is proposed, to meet the matching requirements of massive rules. On the basis of analyzing the structure of production rules and its decomposition principles in cluster environment, the decomposition approach based on pattern elements is adopted. The usage of CPU and memory as well as the network status are considered as factors to describe the load of worker unit, the complexity of map task is measured through the complexity of fact type and its attributes, a improved Rete is used to construct pattern matching network in each map worker. Changes in working memory are expressed as tokens, and token sets are effectively dispatched based on the information of rule decomposition. Map process is used to uniformly describe the matching process of token sets and sub rule sets, the rule matching results are merged and collected in the reduce process.Thirdly, a multiple rule firings method based on interference analysis is proposed, to solve performance bottleneck of select phase causing by traditional conflict resolution strategies. Against the working memory consistency problem aroused by this new method, the dependency relations between rules are analyzed detailed, and then the formal definition of interference is presented. A method of the access request control mechanism is used to exclude interference and select compatible rule instantiations. Based on guaranteeing the serialization of multiple rules, more rules are selected and acted, achieving more parallelism for select phase in the production cycle.Finally, on the basis of studying the above key technologies, the overall architecture of the production system for massive rules is constructed, and the corresponding prototype system is developed. A performance test is made for this system. The experimental results show that parallel rule matching and multiple rule firings methods can speed up the rule processing ability of production system, it can meet the processing requirements of massive rules with a certain extent.