Font Size: a A A

Nettree For Mining Distinguishing Sequence Patterns Based On Density And Gap Constraint

Posted on:2017-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q S WeiFull Text:PDF
GTID:2428330596957420Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer science and the progress of human civilization,the network has been developed more and more rapidly and a large amount of data transmitted through the network is produced.How to find the useful information hidden from the large number of data has been the key of research.Data mining which can extract useful information from millions of data came into being.As an important branch of data mining,sequence pattern mining can dig out all of the useful patterns which satisfy the given frequency constraints.So it is widely used in many applications such as bio-medicine,information retrieval and other fields.Distinguishing sequence patterns have wildly applied in the commodity recommend、user's behavior analysis and power supply and other fields.Mining contrast sequence pattern with density constraint and gap constraint is a pattern aimed at mining pattern that frequent in positive class but non-frequent in negative class.Compared with traditional contrast sequence pattern mining problem,contrast sequence pattern with density constraint is more helpful to find some special factors distribution in biological sequences and more conducive to the discovery of new mutation factor.Therefore,this paper mainly studied the mining of contrast pattern based on density constraint and gap constraint.The research contents of this paper and related work are as follows:1.In this paper,we apply nettree structure to construct MDSP algorithm.This algorithm can calculate all the super-patterns' s occurrence number of the current pattern based on only one scan sequence.Then using the breadth first approach to generate candidate pattern tree,thereby mining all contrast modes;2.We theoretically analyzed the MDSP space complexity and time complexity of the algorithm;3.For the mining algorithm of MDSP and similar gd-DSPMiner algorithm,we carry out lots of comparative experiments on real DNA data set and the set of protein sequences,and analyze the experimental results from two aspects: the number of contrast pattern mining and speed.The experimental results show that the MDSP algorithm can mining more distinguishing sequence patterns than existing algorithm,and this algorithm has high speed on protein sequence database mining that has greater character.
Keywords/Search Tags:sequence pattern mining, distinguishing sequence, frequent pattern, density constraint, nettree
PDF Full Text Request
Related items