In the era of intelligent data,with the rapid development of medical technology and the growing popularity of medical IoT applications,massive amounts of medical data have been generated.At the same time,medical practices is increasingly dependingon data analytic,which requires discovery valuable information from the vast amount of medical treatment data.Therefore,it is especially important to design a data mining method that is suitable for massive medical treatment data.Membrane computing,also known as P system,is a kind of intelligent computing model with non-determinism and maximal parallelism abstracted from the organization and structure of living cells,which is especially suitable for processing massive data.Therefore,this paper proposes membrane clustering algorithms and membrane association rule algorithms,and applies them to the discovery of high-risk mode of stroke pathogenic factors and the division of stroke patients.The main work of this paper includes:1.A mutual-k nearest neighbor graph clustering algorithm based on cell-like P Systems named PICP-MkNNG algorithm is proposed.In order to effectively identify clusters with arbitrary shapes and uneven densities without presetting the number of clusters,this paper combines the mutually k-neighbors(MkNN)clustering algorithm and graph theory to propose a Mutual k-Nearest Neighbors Graph(MkNNG)clustering algorithm.In MkNNG,the mutually k-neighbors are relatively close and connected one by one to form a connected subgraph.The nodes contained in the connected subgraphs are relatively close together so that they are clustered into the same clusters while the nodes in the non-connected subgraphs are relatively distant so that they belong to different clusters.In order to improve the efficiency of the MkNNG algorithm and process massive data,based on the non-determinism and great parallelism of P system,this paper designs a cell-like P system name PICP system which contains Multiple promoters and inhibitors.Based on PICP,a novel PICP-MkNNG clustering algorithm is proposed,which uses the membrane rules to solve the clustering problem.Experiments and analyzes show that PICP-MkNNG algorithm can obtain good clustering quality for data of different sizes and shapes without preset clustering numbers,and also has extremely high computing efficiency.2.A shared nearest neighbor graph clustering algorithm based on tissue-like P System named PITP-SNNG algorithm is proposed.In order to distinguish the clusters which are close to each other,based on the MkNNG clustering algorithm,an improved algorithm named Shared Nearest Neighbor Graph clustering algorithm(SNNG)is proposed.In SNNG,only when the mutual-k nearest neighbors shared a number of common neighbors,they belong to the same cluster.In order to improve the efficiency of the SNNG,a cell-like P system called PITP system with Multiple promoters and inhibitors is designed,and based on it,a novel PITP-SNNG clustering algorithm is proposed.Experiments and analyzes show that PITP-SNNG algorithm can obtain good clustering result and has high efficiency.3.An improved maximum frequent set algorithm based on tissue-like P system named PITP-PS algorithm is proposed.Pincer-Search is an efficient algorithm for discovering the maximum frequent set.In order to improve the searching efficiency of maximum frequent set,we combine P system theory and Pincer-Search to design an evolution-communication tissue-like P system with Multiple promoters and inhibitors named PITP system.Based on PITP,PITP-PS algorithm was proposed.Examples and analysis illustrate the efficiency of this algorithm in finding maximum frequent set.3.An improved association rules algorithm based on tissue-like P system named PITP-MSApriori algorithm is proposed.MS-Apriori algorithm is a frequent itemset mining algorithm that is suitable for inconsistent frequency of transactions in database.In order to improve its search efficiency in large datasets,this paper combines it with P system and proposes PITP-MSApriori algorithm.Examples and analysis show that this algorithm is not only suitable for dealing with sparse items in case of unequal transaction frequency,but also has great parallelism and is suitable for processing large datasets.5.A maximum pattern subspace clustering algorithm based ontissue-like P system named PITP-MFIS algorithm is proposed.Subspace clustering is an effective method to solve high-dimensional clustering.A new method for subspace clustering based on tissue-like P system named PITP-MFIS algorithm is proposed.The membrane computing model contains two parts: subsystemⅠand subsystem Ⅱ.SubsystemⅠobtains clusters in each one-dimensional space by using PITP-SNNG algorithm.Subsystem Ⅱ takes the result of subsystemⅠas input,combins subspace discovery method and maximum frequent item set discovery method to find relevant subspaces and clusters that satisfy the preset conditions by using PITP-PS algorithm.Experiments and analysis in the artificial data set and UCI data set show that PITP-MFIS algorithm not only has good clustering quality for high-dimensional data,but also has high computational efficiency and is suitable for processing large-scale and high-dimensional data sets.6.Different combinations of stroke risk factors are obtained using PITP-PS algorithm,and hospitalized stroke patients are divided into communities using PITP-MFIS algorithm.This study investigates 1254 stroke patients hospitalized in one hospital from January to December 2018.Based on the electronic medical records,combined with imaging data and relevant test data,we collect information on the diagnosis and treatment of stroke patients.First of all,we preprocess the obtained data and convert the transaction records in the database into objects suitable for P system processing.Then,different combinations of stroke risk factors are obtained using PITP-PS algorithm and different divisions of the patient community are obtained using the PITP-MFIS algorithm.Finally,the results are analyzed and discussed. |