Font Size: a A A

Medical Data Processing And Analysis Based On Machine Learning Algorithm

Posted on:2019-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2404330545497829Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of medical information systems,various medical units have accumulated a large amount of information.These data include some physical examination data,clinical electronic medical records,medical images,etc.,as well as data from various medical units,such as medical insurance data,which contains huge application value and commercial value.Based on the physical examination data and health insurance data obtained from medical data institutions,this paper designs different algorithm schemes for different data characteristics,and makes prediction analysis and research.This article mainly does the following work:1.Preprocessing data,including data cleaning,data integration,data conversion and data simplification,based on the physical examination data and diagnostic data provided by the hospital.For a large number of missing data values in the feature matrix,a method of filling the missing data with matrix completion is proposed to generate new data sets for prediction.The experimental results show that the prediction model of fatty liver disease can be effectively improved by the proposed matrix completion method.2.The prediction algorithm of two diseases was studied and proposed.The training prediction model based on decision tree algorithm was proposed for the lymphocyte hyperplasia with the largest number of patients in the diagnosis data,and the accuracy of 98.20%was achieved.The paper also predicted the fatty liver disease,the data set was simplified and the algorithm based on logistic regression was used to predict and the accuracy was 87.75%.By analyzing the characteristics of the data set and carrying out a series of optimization processing on the original data set,including dimensionality reduction,feature selection,removing missing data,filling missing values,etc.,the accuracy rate of prediction was raised to 90.90%.Finally,a method based on gcForest was proposed and applied to the previous prediction of fatty liver disease.Compared with the previous prediction results,the accuracy rate has been improved.3.Researched and proposed two types of time series analysis models.According to the medical insurance data,this paper selects the number of hospital admissions in a hospital in a city for time series analysis to predict the number of hospital admissions per day for the hospital in the future.By comparing the prediction results of commonly used time series analysis methods,periodical trend analysis and analysis of holiday influence factors are added,and a large-scale time series analysis based on the Prophet system is proposed.Taking into account the city's weather and air quality data,combined with daily hospitalizations for correlation analysis,some higher-confidence association rules was found.
Keywords/Search Tags:Disease prediction, Matrix completion, Correlation Analysis
PDF Full Text Request
Related items