Distributed Parallel Machine Learning Algorithms And The Application In Biomedical Field

Posted on:2019-10-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J G Chen

Full Text:PDF

GTID:1364330545472898

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technologies,such as the Internet,the Internet of Things,and sensor networks,large-scale datasets have been exploded in various application fields.In the era of big data,the issue of efficiently and accurately extracting valuable knowledge from these datasets has attracted increasing attention in academic and industrial fields.Efficient machine learning and data mining technologies are urgently needed for big data processing.At the same time,computing resources such as parallel computing and distributed computing,provide efficient computing power for machine learning technologies.In this dissertation,the distributed parallel machine learning algorithms are researched,including parallel classification,clustering,graph mining,and deep learning algorithms.In addition,the proposed algorithms are applied in the fields of Medical,Bioinformatics,and Biomedicine,providing a scientific basis for medical diagnosis and exploring the law of life and biological activity.The main jobs and innovation of this dissertation are as follows:(1)Research a distributed parallel classification algorithm and its application in the hospital queuing-recommendation.We propose a Parallel Random Forest(PRF)classification algorithm based on the Apache Spark cloud platform.The parallel solution of PRF is designed from the perspectives of data parallelism and task parallelism,respectively.In terms of data parallelism,methods of vertical data partitioning and data multiplexing are proposed to effectively reduce data communication costs among different machines.In terms of task parallelism,a two-layer parallel training method is proposed,where the training process of PRF is performed in parallel among different decision trees in the PRF model and different nodes in each tree,respectively.In addition,the proposed PRF algorithm is applied to the Hospital Queuing-Recommendation(HQR)system,where PRF is used to train the patients' treatment time-consuming model.Then,according to the trained model and the current queuing situation of each treatment project,the HQR system can provide an intelligent treatment route planning for each patient.(2)Research a parallel clustering algorithm and its application in the disease diagnosis and treatment recommendation.We propose an Adaptive Domain Density-peak Clustering(ADDC)algorithm.Firstly,aiming at the problem of sparse cluster loss on the datasets with varyingdensity distribution(VDD),we propose an adaptive domain density measurement method.Secondly,aiming at the problem of cluster fragmentation on the datasets with multiple domaindensity maximums(MDDM),we propose a cluster self-merging method.In addition,the proposed ADDC algorithm is applied to the disease diagnosis and treatment recommendation system.We can effectively identify the disease symptom clusters that have multiple symptoms and multiple etiologies,from the massive historical disease treatment datasets.Then,association rules between the disease symptom clusters and their corresponding treatments are analyzed.The system can automatically identify a patient's current disease symptoms depending on his inspection report and recommend the corresponding treatment plans.(3)Research a parallel deep learning algorithm in distributed computing environments and its application in the colon cancer cell nuclear detection and classification.Based on distributed computing,a Bi-layer Parallel Training architecture of Convolutional Neural Network(BPTCNN)is proposed to effectively improve the CNNs training performance.In the outer parallel training,strategies such as data parallelism,asynchronous weight updating,and dynamic data migration are proposed to address the problems of data communication,task synchronization,and workload balancing in distributed parallel computing.In the inner parallel training,the training process of each CNN sub-network is further accelerated on each machine.In addition,the proposed BPT-CNN algorithm is applied to the diagnosis of pathological images,and a deep learning-based colon cancer cell nuclear detection and classification algorithm is proposed.It can effectively detect and classify cancer cell nuclei in different forms from pathological slice images.(4)Research a parallel graph mining algorithm and its application in the Protein-Protein Interaction(PPI)network.Firstly,we integrate the original PPI network and the Gene Expression Datasets(GED)to construct a Weighted PPI(WPPI)network model,where we both consider the protein topology of PPI and its genetic relationships in specific biological processes.In addition,a Multi-source Learning-based Protein Community Detection(MLPCD)algorithm is proposed for the WPPI networks.Moreover,the detected protein communities are compared with known protein complexes and function modules.The Gene Ontology annotations are used to assess the functional enrichment of these communities.Experimental results show that the MLPCD algorithm is superior to related algorithms in terms of accuracy and performance.The work of this dissertation has rich theoretical value and great practical significance.Especially in the era of big data era,it makes full use of distributed computing and parallel computing resources to improve the performance of scalable parallel machine learning algorithms.Then,we explore the application of these algorithms to the field of Biomedicine,laying a solid foundation for the application of other practical fields.

Keywords/Search Tags:

Distributed computing, Parallel computing, Machine learning, Big data, Biomedicine

PDF Full Text Request

Related items

1	Research On Optimization Of Eclat Algorithm Based On Cloud Computing And Medical Big Data
2	The Research Of Virtual Heart Parallel Computing Method Based On Cluster System
3	Matrix Computation And Its Application In Simulation Of Hemodynamics Based Cloud Computing Platform
4	The Construction And Evaluation Of Parallel Biological Computing System And Genomic Clustering And Function Annotation Of Metastasis-related Genes
5	Research And Practice Of Medical Imaging Cloud Services Platform
6	Fast Data Processing Methods For Super Resolution Localization Microscopy
7	Diagnosis And Prognosis Of Prostate Cancer Based On Intelligent Image Computing
8	Research On Parallel Processing Methods For Big Data In Medical&Healthcare
9	Research Of The Reliable Platform Of Distributed Machine Learning For Medical Data Based On Blockchain
10	Parallel Computing Method Research Of The Open NMR Magnet Design