Research on Learning Hidden Variables Dimensionality of Bayesian Networks and Its Application
|School||Hefei University of Technology|
|Course||Computer Software and Theory|
|Keywords||Bayesian Networks Latent Variables Dimensionality Markov Blanket Simulated Annealing|
In the past decades, a great deal of research has focused on learning BayesianNetworks from observation data. But the real world is complex and changeable. Alot of knowledge is hidden, not easy to be observed. In the research of BayesianNetworks also has the same problem. Hidden variables are not observed, but theycan simplify the structure of networks, summarize the information betweenvariables and optimize the model. Therefore, learning the Bayesian Networksmodel with hidden variables became an important research field. Thedimensionality of a latent variable has significant effect on the representationquality and complexity of the model. The method of learning latent variablesdimensionality can be used in the reconstruction of gene network. Discover thedimensionality of hidden variables is not only a challenging research work, but alsohas important scientific significance and high application value.In this dissertation, we propose a novel method to learn the dimensionality ofhidden variable. Learning the hidden variables dimensionality of BayesianNetworks includes two aspects: the first is to learn the dimensionality of a singlehidden variable; the second is to determine the dimensionalities of multiple hiddenvariables. In view of the two aspects, carried out in this article are as follows:Firstly, we use Markov blanket of latent variable to construct a local networkand abandon other variables in model which are condition independent with thelatent variable. Then, we score the local network instead of original network, so therunning time of scoring a local network is much less. We utilize state-clusteringmethod to score the network for each dimensionality of latent variable, where asimulated annealing strategy is introduced to avoid local optimum and enhance theaccuracy of the method. we choose the dimensionality of latent variable which canmake the network obtain the best score based on the two stages.Secondly, Bayesian Networks may contain several hidden variables withcomplex relationships, especially when they are not independent to each other. Wepropose MSSA algorithm to learn the dimensionalities of multiple variables. Wepush all hidden variables onto a queue. At each iteration, we fix the number of states and the state assignment to instances for all the hidden variables but onewhich in the head of the queue. Then, we push the hidden variables which notindependent with this hidden variable onto the queue. We apply the SSA algorithmwith respect to this hidden variable. At the next iteration, we select another variableand repeat the procedure. Thus, we continue the procedure until no hidden variablehas changed its dimensionality.These new methods have excellent learning performance and can deal withwell the complex networks. Extensive experiments validate the effectiveness of ourmethod against other algorithms.At last, we combine the MSSA algorithm and SEM algorithm to learn the Genenetwork which contains the latent variables based on real gene data. Wedemonstrate that our method can effectively learn the gene networks.