Font Size: a A A

Research On Hyperlipidemia Risk Prediction Based On Ensemble Algorithm LightGBM

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HuFull Text:PDF
GTID:2494306722964169Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As one of the chronic diseases,hyperlipidemia has an increasing trend,and dyslipidemia usually causes other complications,which has a great impact on the health of the people.With the promotion of big data and artificial intelligence technology in the medical technology field,preventive medicine has become the current research trend in the field of smart medicine.From the perspective of prevention,this paper uses data mining technology to study physical examination data,to achieve precise prediction of hyperlipidemia diseases,and to explore related factors of the occurrence of the disease.The main work of this paper are as follows:(1)Aiming at the text in the physical examination data,this paper proposes a feature extraction method based on the Text CNN algorithm.For long texts,convolution kernels with a size of 3/4/5 are used to extract features,and for medium-length texts,convolution kernels with a size of 1/2/3 are used to extract features.This makes the extracted semantic information more accurate,and the experiment also uses pre-trained word vectors,which provides a good data basis for the accurate prediction of the subsequent model.(2)For high-dimensional physical examination data,this paper proposes the Light GBM-RFECV algorithm for feature selection.The algorithm uses Light GBM as the basis to iteratively build a model to select the best features.Experiments show that the subset after feature selection increases the computational efficiency by 41.5% without reducing the accuracy of the model.(3)The Light GBM algorithm has many hyperparameters.In order to find suitable hyperparameters to improve the prediction accuracy of the model,this paper proposes the Light GBM algorithm based on the adaptive particle swarm parameter optimization(APSO_Light GBM).Experiments show that the performance of adaptive particle swarm optimization algorithm is better than that of ordinary particle swarm optimization algorithm.In addition,the experiment also uses grid search and random search to optimize the hyperparameters of the Light GBM algorithm.The results show that the adaptive particle swarm optimization algorithm can find better hyperparameters and spend less time than gird search and random search.In order to further verify the performance of the APSO_Light GBM algorithm,this paper compares algorithms such as multiple linear regression,support vector regression,random forest and Light GBM with APSO_Light GBM algorithm.Experimental results show that the model based on APSO_Light GBM has the highest prediction accuracy,which is significantly better than other algorithms.(4)Finally,in order to explore the relevant factors of the disease,the importance features of the model are analyzed.Combined with the existing medical research,it is found that the important features obtained by the model are mostly consistent with the medical results,which once again verifies the effectiveness of the model.The verification shows that the hyperlipidemia prediction model proposed in this paper can achieve accurate prediction of hyperlipidemia diseases based on medical physical examination data.Moreover,the early prevention of the disease can be achieved through the mining of related factors,which is of great significance to help the medical community in preventive diagnosis and reducing the incidence.
Keywords/Search Tags:Hyperlipidemia prediction, Data mining, Particle swarm optimization, Ensemble Algorithm, APSO_LightGBM model
PDF Full Text Request
Related items