Font Size: a A A

Research Of Constraint Satisfaction Problem And Its Application In Protein Structure Prediction

Posted on:2016-07-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:1220330467493950Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Constraint programming was derived from1965. It is an important problem in computerscience and a classical foundation of artificial intelligence. As a central problem of constraintprogramming, the constraint satisfaction problem (CSP) became a hot issue since1985. Manyfamous problems in artificial intelligence can be modeled by CSP, such as Configuration,Scheduling, Vehicle Routing, and so on. Besides these classical AI problems, constraintprogramming has widely applications in Operation Research and Bioinformatics.The central issue in CSP is solving a CSP. The backtracking search is a complete centralalgorithm to solve CSPs. However, the na ve backtracking is inefficient, so people usuallyintegrate constraint propagation into backtracking search. Constraint propagation removesinconsistent values and reduces the searching space to improve the searching efficiency.Besides constraint propagation, the order in which variables are assigned is crucial to theefficiency. The variable ordering heuristics and constraint propagation are the two mostimportant factors to the efficiency of a backtracking algorithm to solve a CSP. Although thereexist some successful constraint propagation techniques and variable ordering heuristics,designing some specific methods for some problems with distinct features may bringimprovements over efficiency.The thesis first introduces the basic definitions and concepts of constraint satisfactionproblems, together with a basic frame of a backtracking search strategy which is used to solveCSPs. The popular constraint propagation techniques and variable ordering heuristics arereviewed. How to integrate constraint propagation and variable ordering heuristics in thebasic frame in described in detail. The features of different constraint propagation andheuristics are analysed. In addition, the constraint satisfaction problem is applied to animportant problem in Bioinformatics: Protein Structure prediction. The research focuses onconstraint propagation and variable ordering heuristics, more specifically, they include:(1) The maintaining arc consistency algorithm, maintains arc consistency duringbacktracking search, is the most popular method to solve CSPs. In the past20years, most ofthe research of CSP focused on the procedure that searches supports for values, which ignoredthe frame of arc consistency algorithms. The author proved that the revisions for the arcs thatpoint to assigned variables are useless and proposed a method for avoiding such redundantrevisions in maintaining arc consistency. This method improves the basic frame ofcoarse-grained maintaining arc consistency algorithms.(2) Table constraints theoretically represent all relations of constraints. Recently, thespecific constraint propagation algorithms on table constraints attract many researchers in thisfield. There are two kinds of table constraints: positive table constraints and negative tableconstraints. They can be equivalently converted into each other. People usually consider spacecost to determine whether use positive table constraints or negative table constraints. The simple tabular reduction (STR) algorithm is an efficient generalized arc consistency algorithmfor positive table constraints, which is easy to implement. Using STR in maintaininggeneralized arc consistency to solve CSPs is more efficient than other algorithms. However,the existing STR algorithms work only on positive table constraints. Some practical problemscontain negative table constraints, so a new STR algorithm which works on negative tableconstraints, named STR-N, is proposed. The new STR algorithm is also efficient and easy toimplement. Besides the basic STR-N algorithm, two improved version are proposed. TheSTR-N2avoids some redundant operations in validity checks. The STR-NIC saves someuseless iterations of the tables.(3) Arc consistency (AC) is the most popular constraint propagation technique to be usedto solve CSPs. Some recent works show that max-restricted path consistency (maxRPC) alsocan be used in search. Although maxRPC has a more powerful ability to filter values than AC,it is more time consuming, so it is less efficient than AC when being used in search. Themajor difference between these two is whether finding PC-witness. The author proposed aprobabilistic max-restricted path consistency (PmaxRPC) with a method to calculate theprobability that a PC-witness exists. According to this probability, PmaxRPC determineswhether finding a PC-witness. PmaxRPC is a tradeoff strategy of AC and maxRPC. Theexperiments show that PmaxRPC is more efficient than both AC and maxRPC while solvingsome CSP instances with tight constraints.(4) The degree-based heuristics is a classical variable ordering heuristic. Its improvedversions, dynamic degree and weighted degree was the most efficient heuristics. Combiningthe weighted-degree heuristic and min domain size heuristic, the dom/wdeg variable orderingheuristic is one of the most popular heuristics, which is efficient on most binary CSP instances.Besides, it is easy to implement. After reviewing the most popular heuristics, a new variableordering heuristic that integrates constraint tightness into dom/wdeg is proposed. Comparedwith dom/wdeg, the new heuristic further actualizes the “fail-first principle”. Two practicalmethods to dynamically calculate constraint tightness on binary and non-binary constraintsare also proposed. The new heuristics improves the dom/wdeg heuristic.(5) In protein structure prediction, the template-based methods usually generate moreaccurate results. Every template-based method faces the problem that how to select templates.Most alignments of templates do not cover the whole sequence, so people need to find a set ofalignments that the combination covers the whole sequence. The author proposes a method tomodel the combination problem with CSP. This method can find all possible combinations ofalignments that cover the whole sequence and ensure that each combination contains noredundant cover. According to the feature of these CSP instances, the appropriate heuristicand generalized arc consistency algorithm are selected to solve these instances. The methodwhich is used to compute explanations of conflicts in interactive constraint satisfactionproblems is applied to solve the over-constraint problem in this specific CSP. This part couldbe a foundation of some future research, such as scoring each alignment and model the newproblem by a constraint optimization problem. Besides, based on this CSP model, users candesign their own constraints or solving strategy according to their specific demands.
Keywords/Search Tags:constraint satisfaction problem, solving CSP, constraint propagation, variable orderingheuristic, protein structure prediction, template combination
PDF Full Text Request
Related items