Font Size: a A A

Methods And Applications Of Omics Big Data Analysis Towards The Diagnosis And Treatment Of Complex Diseases

Posted on:2020-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y YinFull Text:PDF
GTID:1484306548492554Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Complex diseases such as cancer seriously threaten human health.The disease is divided according to a few clinical characteristics of patients,and then each class is treated with specific approach.However,the treatment will have different reactions in different individuals due to the complex molecular interaction and regulation process.Complex diseases are often caused by the interaction of genetic factors,environmental factors,living habits and other factors,and do not follow Mendel’s law of inheritance.Therefore,family history and genetic information can only explain the probability of an individual’s disease,but do not strictly mean that he/she will be sick,which makes the diagnosis and treatment of complex diseases more difficult.With the development of next generation sequencing technology,the cost of sequencing decreases with the trend of super Moore’s law.At present,the cost of sequencing an adult’s whole genome is about $1000,which makes the acquisition of genomic data easier.Genomics,transcriptome,proteomics and other omics data appear in large quantities.The outbreak of big omics data makes it possible for researchers to diagnose complex diseases based on the actual internal states of patients more comprehensively and accurately and carry out specific treatment on this basis.However,at present,the diagnosis and treatment of complex diseases are still based on the traditional clinical characteristics and clinicians’ experience,without any information from the patient’s multi-omics data,especially the specific information of each patient.The emergence of the concept of precision medicine is promoting the research of individualized medicine by relevant researchers.It is expected that clinicians can identify more applicable treatment targets or mechanism of action pathways for the patients according to their inner situations,so as to tailor the most appropriate treatment therapies.However,there is still a lack of researches on the methods of diagnosis and treatment of complex diseases based on large scale omics data,so it is urgent to design robust and efficient machine learning methods to meet the characteristics of biological data and the needs of clinical diagnosis and treatment of complex diseases.For the three important aspects of diagnosis and treatment of complex diseases,i.e.disease subtyping,target identification and drug repositioning,we propose the corresponding big data analysis methods to solve five related problems step by step in this work.To be specific,1.With respect to disease diagnosis,we propose a multi-modal similarity matrix joint factorization method for cancer subtyping.By introducing the concept of similarity matrix construction and group sparse constraint,we design an objective function with clear meanings,then we propose optimization algorithm to solve the problem and prove that the objective function will converge to its local minimum with the proposed optimization algorithm.We evaluate the proposed method on the simulated data and a variety of cancer data to show its superiority to the existing method,and analyze the subtyping related features to show its reliability.We further analyzed the performance of the proposed method in both single modality data analysis and multiple modality data analysis.2.In the aspect of target identification,a method based on the multifaceted confidence of omics data is proposed,which comprehensively considers the multifaceted information such as the differential expression of candidate genes,the change of DNA methylation level,the impact of candidate genes on the survival and prognosis of patients,gene functional analysis and drug target information,so as to improve the confidence of predicted targets and reduce the false-positive results which will induce the low success rate during the verification in subsequent experiments.For the four subtypes in breast cancer,we identify a total of 11 subtype specific drug targets with high confidence with the proposed method.3.In the aspect of target identification,we establish a unified framework for cancer subtyping and subtype specific targets identification by proposing a new non-negative matrix tri-factorization model.Orthogonal constraints and sparse constraints are introduced to adapt to the prior knowledge of biology and improve the interpretability of the model.Better subtyping results are obtained with liver cancer data,and the subtype specific targets are analyzed by KEGG signaling pathway data for gene functional annotation Drug Bank data for drug target ability.4.For the drug repositioning research,given the fact that radiotherapy resistance in breast cancer and other cancers,together with the prior knowledge that eif4g1 protein is over-expressed in breast cancer cells and can repair DNA damage caused by ionizing radiation,the drug repositioning research of breast cancer radiosensitizers is carried out based on large-scale cell response data.The in vitro and in vivo experimental results show that bosutinib can significantly suppress the tumor growth,decrease the tumor volume,improve the overall survival and induce the apoptosis of tumor cells,so it can be used as sensitizer of breast cancer in radiotherapy.5.For the drug repositioning research,in the current study of drug repositioning,only a few genes with significant changes are used,while most of the gene features are ignored.In this paper,we carry out a repositioning study of anti-HBV drugs by proposing a non-negative matrix factorization method with orthogonal constraints that can utilize the whole genome expression signatures and a trick to calculate the signatures that are applicable to this method.The results of in vitro experiments show that sitagliptin could significantly inhibit the replication of hepatitis B virus and the expression of related proteins.Sitagliptin is a drug approved by the FDA for the treatment of diabetes,which can be directly used in clinical trials.
Keywords/Search Tags:Diagnosis and treatment of complex diseases, big omics data analysis, cancer subtyping, drug target identification, drug repositioning
PDF Full Text Request
Related items