Font Size: a A A

Research And Implementation Of Factor Reserve Inference Algorithm For Bayesian Networks

Posted on:2008-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HuangFull Text:PDF
GTID:2178360212996183Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
VE and CTP are the most widely used exact inference algorithms for Bayesian Networks (BNs), and they both have their unique advantages: VE allows one to remove nodes irrelevant to a given query while CTP allows computation sharing among multiple queries. However, their disadvantages are apparent too: VE has to eliminate all the hidden nodes for each single query, this will cause a large amount of redundant computation; when it comes to CTP, the complexity of initialization is very high. In our experiment, we prove that they do hold these properties written above. Still, we find that the effectiveness of VE is strongly influenced by the location of query and evidence nodes as well as the amount of evidence nodes. That is, in a pruned network, provided other factors which may influence the effectiveness of VE are fixed, the closer the query nodes are to the root nodes, the closer the evidence nodes are to the leaf nodes, and the larger amount of evidence nodes there are in the network, the more effective VE is. As to CTP, when there are fewer query nodes in the network, the effectiveness of upward and downward propagation algorithm performed in CTP is relatively low. What is more, when the multiple query nodes do not belong to any cluster in the junction tree, it is impossible for CTP to answer the joint posterior probability directly.In fact, there are such BNs in which the query and evidence nodes are restricted in a subset of the set of the nodes. That is, the set of the nodes in such a BN is divided into the set of query nodes Q, the set of evidence nodes E, and the set of hidden nodes H. Therefore, it is very natural for us to only care about the posterior probability of several nodes in Q given the observation of several nodes in E in such BNs. In 2004, Liao etc proposed an algorithm called FTI which constructs a factor tree to conserve the intermediate computation in order to simplify the process of upcoming queries. More specifically, each f node in a factor tree reserves a factor, and in each upcoming query FTI uses message passing algorithm (upward message passing algorithm only) to propagate evidences, at the same time eliminates each non-query variable in the factor and computes the product of all the factors where the posterior probability lies. Liao proved that the performance of FTI dealing with multiple queries in such BNs is the best among the three. However, the picture is not always rosy. In our experiment, we find that the correctness of FTI can not be guaranteed in some circumstances. More specifically, for some BNs, the result of factor tree construction algorithm performed in FTI is not a factor tree but a factor graph instead which means there is at least one loop. This is absolutely a disaster for FTI because message passing algorithm can only be applied on a tree rather than a graph. When it comes to this, FTI can not achieve the right posterior probability. We modify FTI for some extent, and prove that the algorithm modified does perform better than VE and CTP.According to the analysis written above, we propose FR algorithm. Similar to FTI, FR is proposed to deal with multiple queries in a BN in which the query and evidence nodes are restricted in a subset of the set of the nodes. FR includes the advantages of VE, CTP and FTI, that is, FR allows one to remove the irrelevant nodes as well as share computations among multiple queries. At the same time, FR excludes the disadvantages of the three, that is, there is no need for FR to eliminate hidden nodes in every single query and to construct the complicated structures such as junction tree as well as factor graph. The process of FR is similar to VE. However, for each single query, VE incorporates evidences in the first place, then eliminates all the non-query variable to compute the posterior probability. While FR reserves a factor through elimination of all the hidden variables in the first query, after that, FR only needs to incorporate evidences on the factor and eliminate all the non-query variables to compute the posterior probability in the upcoming queries. Therefore, we can see that, both FTI and FR reserve intermediate computation to simplify the process of upcoming queries, however, what FTI reserves are multiple factors, which means FTI has to compute the product of all the factors reserved before. When it comes to FR, it is another picture. What FR reserves includes the product of all the factors reserved in FTI, therefore, there is no use to compute the product in the upcoming queries anymore! In fact, FTI is an algorithm which lies between VE and FR, for VE does not reserve any intermediate computation, FTI reserves partial computation, while FR reserves them all. It is natural for VE to hold the lowest complexity (time complexity) for the first query: firstly, VE incorporates evidences before the elimination of non-query nodes therefore the size of factor in the elimination process becomes smaller; secondly, VE does not reserve any intermediate computation. CTP holds the highest complexity, because it has to construct a junction tree. However, it is very hard to compare FTI with FR to determine which holds the lower complexity for the first query. We divide the first query into two steps: the initialization step and the query step. Therefore, the time spending for the first query equals to the summation of the time spending on the initialization and the query step. In the initialization step, if we only take into account the complexity of intermediate computation, the complexity of FTI is obviously lower than that of FR. That is because what FR reserves include the product of all the factors reserved in FTI. However, when the hidden nodes are quite centralized, the initialization of FTI converts to that of FR. In some circumstances, all the factors of hidden nodes conglomerate into one single factor (as shown in our experiment), this will reduce the difference between the two. In addition, FTI has to construct a factor graph which makes FTI more complex. In the query step, FR absolutely holds the lowest complexity: firstly, what FR reserves are much more integrated than what FTI reserves; secondly, FTI has to apply message passing algorithm on the factor graph which makes FTI more complex. According to the analysis written above, we can draw this conclusion: generally speaking, the complexity of FTI is lower than that of FR in the initialization step, while higher than that of FR in the query step. What's more, the lower complexity FTI holds in the initialization step compared with FR, the higher complexity FTI holds in the query step. In order to compare the complexity between the two, the location and amount of the hidden nodes as well as the location of query nodes have to be taken into account. In the upcoming queries, FR holds the lowest complexity, however it is hard to compare the other three. CTP uses upward and downward message passing algorithm, so it can answer any query (maybe not joint posterior probability) in a linear time after the two propagations. Therefore, the more query nodes there are in the network, the more effective CTP is. FTI reserves intermediate computation which lowers the complexity, however, it has to apply message passing algorithm on a factor graph, which at the same time risen the complexity. In order to compare the complexity of the three, we have to consider the location of the query nodes (which strongly influences VE and partially influences FTI) and the amount of the query nodes (which strongly influences CTP), the location and amount of evidence nodes (which strongly influence VE) as well as the location and amount of hidden nodes (which strongly influence FTI and FR) : when the amount of hidden nodes is relatively large, the complexity of FTI is lower than that of VE and CTP, but higher than that of FR.We use Alarm Network and Asia Network to analyze the complexity of the four. In order to minimize the effect of error, we repeat each experiment for 10 times, then compute the average number. In Alarm Network, we firstly analyze the effectiveness of single query and multiple queries using CTP. Result shows that the effectiveness of multiple queries is higher than that of a single query. Secondly, we analyze the effectiveness of the four. Result shows that the complexity of FR is the lowest of the four. In Asia Network, we firstly analyze the effectiveness of the four when the location of hidden nodes differs. Result shows that it has no influence on VE and CTP at all, but it does influence FTI and FR. For FTI, the curve changes from a line to a broken line, at the same time its effectiveness drops. For FR, provided other factors are fixed, the nearer the hidden nodes are to the leaf nodes, the more effective FR is. Secondly, we analyze the effectiveness of the four when the amount of hidden nodes differs. Result shows that it has no influence on VE and CTP too, but it does influence FTI and FR, and what's more FR is the mostly affected: provided other factors are fixed, the larger amount of hidden nodes there are in the network, the more effective FTI and FR are. Finally, we analyze the effectiveness of the four when the location and amount of the query and evidence nodes differs. Result shows that FTI and FR are affected in a small way, while VE and CTP are strongly affected. The more closer the query nodes are to the root nodes, the more closer the evidence nodes are to the leaf nodes, and the larger amount of evidence nodes there are in the network, the more effective VE is. When it comes to CTP, the larger amount of query nodes there are in the network, the more effective CTP is. To sum up, we claim that when the query and evidence nodes are restricted in a subset of the set of the nodes, compared with VE and CTP as well as FTI, FR is the most effective exact inference algorithm to deal with multiple queries in such BNs. With the amount of hidden nodes becomes larger as well as the location of hidden nodes moves lower, the effectiveness of FR will be more and more prominent.
Keywords/Search Tags:VE, CTP, FTI, FR, factor
PDF Full Text Request
Related items