Font Size: a A A

Study Of Pattern Discovery And Classification Models For Single And Multiple Sourced Biological Data

Posted on:2018-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YangFull Text:PDF
GTID:1314330566954686Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The emergence of various observation tools in biomedical field enables researchers to observe the longitudinal information of the target from the biomedical image to gene data.The researchers can also obtain lateral multi-modal information on the same scale based on different observation tools.This makes it possible to observe the target from image to gene,thus,the multiple sourced data greatly enrich the description information of the target.However,these data are heterogeneous,noisy,high dimensional and massive,due to the different resolution,observation scale and so on.Current integration approaches have yet to address these challenges.Multiple sourced and heterogeneous biomedical data provide different information about the target.These data are expected to discover potential information that cannot be found by single sourced data.The pattern discovery and classification from single sourced and multiple sourced biological data is the frontier of bioinformatics in recent years,and will lead to more scientific breakthroughs in the future.The researches of this area are for understanding complex biological systems.The pattern discovery and classification from single sourced and multiple sourced biological data is the main study in this paper.We propose a non-negative matrix factorization with multiple constraints model for finding the potential information from a single sourced biological data and finding out the block characteristics of biological data.We propose a breast cancer early diagnosis model based on deep learning for breast phenotype image.A discrimination classifier model was constructed to assess the accuracies of microcalcifications and breast masses,either in isolation or combination,for classifying breast lesions.Considering different scales and units of different data types,we propose a correlated pattern discovery model based on high-order matching applied to multiple sourced biological data.In this paper,we have conducted researches on these three aspects.The main contents and contributions are as follows:(1)We propose a non-negative matrix factorization with multiple constraints model for discovering the hidden diagonal block structure of biological data.For discovering hidden block structure of the data,we chose the-norm for the feature matrix and the total variation norm on each column of the loading matrix.An efficient numerical algorithm using the alternating direction method of multipliers model is proposed for optimizing the model.Compared with several benchmark models,the proposed method performs robustly and effectively for simulated data.Experimental results on two real biological datasets demonstrate the effectiveness of the proposed method.The model can effectively discover the hidden diagonal block structure in dataset,thus directly obtaining the block characteristics of data;(2)We propose a breast cancer early diagnosis model based on deep learning.Taking the raw image directly may bring in a large bias due to image deformation,uniform background illumination,uneven imaging angle and position.Such problems may deteriorate the classification performance.To alleviate the problems,this study used various types of features that were widely used in researches on breast lesions as input data instead of original images.A discrimination classifier model is constructed to assess the accuracies of microcalcifications and breast masses,either in isolation or combination,for classifying breast lesions.Performances were compared to benchmark models.Our deep learning model achieved a discriminative accuracy compared to other methods.Overall,deep learning based on large datasets was superior to standard methods for the discrimination of microcalcifications.Accuracy was increased by adopting a combinatorial approach to detect microcalcifications and masses simultaneously.This may have clinical value for early detection and treatment of breast cancer;(3)We propose a correlated pattern discovery model based on multiple sourced biological data.The emerging multi-dimensional genomic data poses new challenges in data analysis.Finding correlated patterns from multiple sourced biological data is useful in understanding potential interacting relationships between the multi-modal genomic data.Multi-dimensional genomic data contain multiple genomic data types.Different types of genomic data have different scales and units.These data cannot be simply aggregated for analysis.To address this issue,a correlated pattern discovery model incorporating the prior knowledge is proposed.A tensor similarity is used to measure the correlation of common patterns.The model is combined with the prior knowledge.The expression form of prior knowledge is transformed as the constraints of prior knowledge.Efficient numerical solution is designed and analyzed.The proposed method is shown to perform robustly and effectively for both simulated data and real biological data.We conducted experiments on five real cancer datasets to reveal cancer subtypes.A survival analysis on the found subtypes confirmed the effectiveness of the model.This method is meaningful for doctors to realize personalized diagnosis and treatment of cancer and other diseases.
Keywords/Search Tags:Non-negative matrix factorization, Deep learning, Classification, Multiple sourced biological data, Correlated pattern
PDF Full Text Request
Related items