Font Size: a A A

Application And Research Of Top Scoring Pairs Method For Time Series Gene Expression Data Classification

Posted on:2018-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:K M WuFull Text:PDF
GTID:2310330515969829Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,more and more researchers pay more attention to the research the valuable information from a large amount of data in bioinformatics.Time series gene expression is a genetic data collected at different time points in the process of biological growth.As one of the hot spots of research,time series data mining has attracted more and more attention and research.As it contains a large amount of biological information related to a specific time period,the study of classification of time series gene data is a very significant task.With low sample size and high dimensionality,gene expression data classified by traditional machine learning methods suffers not only the curse of dimensionality but also lack of the interpretability of the complex model derived from the data.The traditional machine learning methods can not only build up a complex classification model in dealing with high dimensional data,but also can be difficult to understand in bioinformatics.Top Scoring Pairs use few features to predict the sample which has achieved good results.In order to make full use of the advantages of TSP algorithm,we presents a Dynamic Top Scoring Pairs.Different from simple value comparison in static data,the key issue of time series data is how to incorporate temporal information into the TSP framework.The main work and innovation of this paper are as follows:The thesis proposed the DTSP algorithm and used the idea of trend to process time series gene.The DTSP algorithm not only considers the difference of the adjacent time points(trend),but also takes into account the Variance which improved the accuracy.At the same time,the prediction model use the trend of rules.In this paper,we use the time series data set to carry on the experiment,the DTSP algorithm and the support vector machine,K nearest neighbor algorithm to do a horizontal comparison,and the DTSP algorithm contains three kinds of algorithms for longitudinal comparison.In this paper,we use the time series data carry on the experiment,the DTSP algorithm,the support vector machine and the K nearest neighbor algorithm to do a horizontal comparison,and the DTSP algorithm which contains three kinds of algorithms to do a longitudinal comparison.Experimental results show that the improved classifier has a high classification result,and the best classification feature is selected.The selected feature pairs will provide new ideas for the study of bioinformatics.Finally,this thesis designed a gene classification system based on DTSP algorithm following the theory of dynamic top scoring pairs.The system not only implements the DTSP algorithm for the classification of time series gene data,but also compatible with traditional machine learning algorithms.The system provides the interface for second development,which can help user to integrate their own gene classification algorithm.The system can not only classify the time series gene data,but also can show the top scoring pairs that provide reference for further medical and biological research.At last,the system this thesis proposed has strong flexibility and has high practical value.
Keywords/Search Tags:Gene Classification, Time Series Gene, Top Scoring Pairs, Machine Learning
PDF Full Text Request
Related items