Font Size: a A A

Gravitation Field Algorithm And Its Application In Bioinformatics

Posted on:2014-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:M ZhengFull Text:PDF
GTID:1260330425465109Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The research on optimal algorithms is the important topic nowadays. Searchingoptima is one of the most challenging tasks from available experimental data or givenfunctions. Optimal algorithms can be defined as below. Given a certain problem,which is a function with N variables generally, the task is to find a best solution. If Nis greater than one, the number of solutions for given function should be also greaterone, even infinite. The computer methods which are used to find the greatest orsmallest value of the criteria for given function is the optimal algorithms. The criteriaare defined for the best value for given function. The optimal algorithms will beapplied for many fields, and can be used for resolving certain problems, such as therationalization of global logistics route choice, the optimization of routinearrangement, the minimization of production cost, and so on.The history of the research on optimal algorithms is long. Although the task is tofind the optimal solution, the algorithm will be not convergent in global solutionspace for large scale data, will approximate the global optimal value. There are manycategories in the optimal algorithms, such as analytical method, direct method,mathematics method and various heuristic search algorithms. The best analysismethod for large scale data is heuristic search algorithms. So the direction of researchis the focus issue. Heuristic search algorithms include simulated annealing algorithm,genetic algorithm, particle swarm optimization algorithm, and so on. But all optimalproblems can’t be resolved by these algorithms, especially multi-extrema problem.The precision is low and the running time is long. So a new heuristic algorithm mustbe proposed to recover these drawbacks. A novel algorithm named gravitation filedalgorithm, which is derived from Solar Nebular Disk Model in astronomy, is proposedin this paper. And this new algorithm is applied in many fields in Bioinformatics asbelow: ⑴Solar Nebular Disk Model describes the formation of planet: dark nebulas inuniverse assemble together in many ways to stars, and the dusts will be ejected fromthe star, and assemble together in the gravitation field to the planets. The model isabstracted by mathematics and merged by innovation contents into gravitation fieldalgorithm. Gravitation field algorithm includes four steps which are initialization ofdusts, division of dusts, movement of dusts, and abstraction of dusts. In the stage ofdusts initialization, the first thing to be considered is the formation and the dimensionof the dusts. The distance between two points for example, the vector which containstwo point identifications is used as the dust in gravitation field algorithm. Thedeterminant value of given matrix for another example, the matrix is used as the dustin gravitation field algorithm. Then every dimension component of given dust isevaluated randomly in solution space. The division strategy is one core problem ingravitation field algorithm. And the many methods are used. When the dimensionalityis one, average method and random method can be used as the strategies. Averagemethod is that the value range of each group is equal and the data are continuous inone group. Random method is that the value range of each group is not equal and thedata are continuous in one group. When the dimensionality is two, greatest commondivisor method and random method can be used as the strategies. Greatest commondivisor method is that the area value will be divided as two greatest common divisors.Every child area will be seen as one group. Random method considers only onedimensionality as the criteria for division, which is the same as one dimension randommethod, another dimensionality will be not considered as the criteria. When thedimensionality is greater than2, random method and extend random method are usedas strategies. Random method considers only one dimensionality as the criteria fordivision, which is the same as one dimension random method, other dimensionalitieswill be not considered as the criteria. Extend random method will divide every dustinto any group. And these data are not continuous. The number of dusts in each groupwill be not equal. Extend random method can be also used in one or two dimensiondivision strategies. The movement of dusts is another core in gravitation fieldalgorithm. Mass function value of each dust will be calculated and be sorted to decide which dust is the centre dust. The surrounding dusts will move to the centre dust, andthe pace is defined as the two points distance multiply the1/10of the Fibonaccinumbers. In the movement process, every surrounding dust will be in the rotationfactor influence. Rotation is the force that is from the centre dust to the surroundingdusts. Rotation factor is the probability of the rotation happens. Rotation factor isbigger when the distance is smaller. Abstraction of dusts is that when the two dustsdistance is small enough, the surrounding dusts will be deleted. If the terminationcondition is reached, the centre dust and the corresponding mass value will be theresult of gravitation field algorithm. Or all the centre dusts will be as the surroundingdusts and be divided again. The novel algorithm is tested by global extreme andmulti-extremes. And the results are compared with other algorithms. The efficiency ofgravitation field algorithm is proofed.⑵Gravitation field algorithm had been applied to the gene expressionclustering algorithms. The data formation used in clustering algorithms is discrete.Gravitation field algorithm should be modified. Firstly, the distance between twogenes is used as the mass function. Secondly, in the stage of dusts initialization, thevector which contains two point identifications is used as the dust. Finally, in thestage of dusts movement, the movement direction is defined by the relationshipbetween two corresponding elements of centre dust and surrounding dust. The step isonly one number which is different from the continuous data. The genes pair ismarked as used. Because used pair can’t produce unexpected value like continuousdata. So the used pair can’t be calculated again. Clustering algorithms are testedthrough hierarchical clustering and un-hierarchical clustering. And the results arecompared with other algorithms results. The efficiency of gravitation field algorithmis proofed.⑶Gravitation field algorithm had been applied to the construction of generegulatory network. The differential equation model is used as math model. And thevalue range will be defined with singular value decomposition. The gene expressionmatrix can be decomposed to3matrixes in the generalized inverse matrix definition.And the special solution will be calculated through this method. The general solution will be calculated after that. In gravitation field algorithm, the least square formula isused as mass function. In the stage of dusts initialization, the matrix is used as dust toevaluate randomly. And the value will be tested by the general solution. If the value isnot reasonable, the dust will be assigned randomly again. In the stage of movement, N×T corresponding elements in centre dust and surrounding dust will be compared. Ifthe two values are not equal, the element of surrounding dust will move to the elementof centre dust. The new value will be tested again, if the value is not reasonable, thedust will move again. Or the movement operator will run again. The constructionalgorithm will be tested through the simulated data and real data. And the efficiencyof gravitation field algorithm in construction of gene regulatory network is proofed.⑷Gravitation field algorithm is used for simulating gene expression data.Simulated gene regulatory network was constructed through the reconnection method.The candidate father node will be calculated and the node will be selected byprobability r. If the node is not selected, the ancestor of candidate father node will beselected by probability1-r. That is, this method can strengthen the centre control node.The gene expression data is simulated by gravitation field algorithm. The solutionspace is defined through singular value decomposition. Gene expression matrix isused as dust, and assigned randomly. In the stage of dusts movement, the element ofsurrounding dust will move to the element of centre dust. Log-log figure is used toverify the accuracy of reconnection method. Three construction software package areused to verify the efficiency of the gravitation field algorithm. The efficiency ofconnection method and gravitation field algorithm is proofed through theexperiments.In summary, gravtitation field algorithm proposed in this paper is a novel fastrunning-time, high efficiency heuristic search algorithm. this algorithm can be appliedto many fields in Bioinformatcs, including gene expression clustering, construction ofgene regulatory network, gene expression simulation, and so on. The efficiency ishigh. And this algorithm can be applied to other field, and the future of this algorithmwill be better.
Keywords/Search Tags:Gravatition field algorithm, Solar Nebular Disk Model, optimization operation, global convergence, gene expression clustering, singular value decomposition, generegulatory netwok, simulation of gene expression
PDF Full Text Request
Related items