With the continuous development of computer technology and the improvement of medical information databases,artificial intelligence is more and more widely used in the field of health care.With the development of The Times,computer-aided medical diagnosis also provides more and more convenience for medical diagnosis.In this paper,a variety of machine learning methods were used to explore clinical test data.To establish a computer-aided medical diagnosis system for the diagnosis and early warning of malignant diseases from the perspective of clinical needs,so as to assist doctors to quickly identify malignant diseases from other diseases.The successful construction of this system not only means that it can find the deep correlation between the diseases studied and clinical indicators in this paper,but also helps to explore the correlation between other types of diseases and clinical indicators.In the first chapter,the current clinical application status of routine detection and its value in disease diagnosis are introduced,and the current clinical application and shortcomings of tissue biopsy and liquid biopsy are expounded,and the development and research status of computer-aided diagnosis are summarized.Finally,the process and algorithm principles of machine learning modeling mainly involved in this paper and the related evaluation indicators are introduced in detail.In the second chapter,with the framework of H2 O,a recognition system for gastric cancer was established by integrating various blood biochemical indexes through deep learning.Firstly,based on the sample data of 2951 cases,60 items of age,gender,blood routine and biochemical indicators were collected as comprehensive indicators to establish the recognition model.Then,the number of indicators is reduced through feature selection to simplify the model.Finally,33 indicators are used to construct the final recognition model.The performance of the model was evaluated by ten-fold cross validation technique.The sensitivity,specificity,accuracy and the area under receiver operating characteristic curve of the cross-validation set were 85.44%,83.82%,84.54% and 0.9165,respectively.Deep learning integrating blood biochemical indicators will bring new insights into the comprehensive understanding of gastric cancer pathology,as well as the prevention,screening,diagnosis and prognosis of gastric cancer.In the third chapter,to further validate the feasibility of identifying malignant diseases through clinical test data,we attempted to identify patients with high incidence of liver cancer from multiple liver diseases,other cancers,and healthy people.In order to further broaden the scope of clinical test data,this study included urine test indicators in addition to blood routine and blood biochemistry.First,based on the data of 3091 samples,59 clinical detection indicators were taken as the initial characteristics to build a deep forest discrimination model for liver cancer samples,and the most critical 21 features were obtained through feature importance ranking and screening to build the final model.On this basis,a visual discriminant tool was constructed to identify hepatocellular carcinoma using clinical detection indicators,which was convenient for users to use.The discriminant model of clinical detection indicators for liver cancer patients established by deep forest algorithm showed that the area under receiver operating characteristic curve of discriminant results was0.9244,the accuracy was 84.49%,the specificity was 84.18%,and the sensitivity was84.69%.The deep forest algorithm was used to train the clinical detection data,and the model built at last could effectively assist the identification of liver cancer patient samples.In the fourth chapter,in order to expand the above methods in screening for other malignant diseases,we attempted to classify and identify ten types of multiple cancers.This study included a total of 9150 sample data containing ten types of cancer patients and 3579 non cancer sample data.The scope of selecting clinical testing indicators has been further expanded,including blood routine,blood biochemistry,urine routine,fecal routine,thyroid function examination,and tumor marker detection.The collected data is first subjected to data cleaning and feature engineering.Then,establish binary classification models for distinguishing between cancer and non cancer,as well as multi classification models for distinguishing cancer types.Establish a Blending model that integrates 7 tree class models,and perform feature importance sorting and feature selection operations based on the Robust Rank Aggregation method.The binary classification model showed stable predictive performance on the test set,with the area under receiver operating characteristic curve of 0.8929,sensitivity of 0.8583,specificity of 0.7258,and accuracy of 0.8256.The discrimination accuracy of the multi classification model on the test set reached0.6905.This result indicates that the Blending model established by this work can effectively assist in identifying ten types of cancer patients. |