| Type 2 Diabetes is a fast-developing disease featured with insulin resistance.Nearly one adult in ten is suffering from T2D globally.The efficiency and degree of adverse effects of hypoglycemic drugs are very diverse,partially because of genetic diversity.Some Type 2 Diabetes patients test their genetype of the key metabolic cytochrome P450 enzymes such as CYP2C19 and CYP3A4 to guide the medication.The clinicians design medication schemes based on comprehensive evaluation of the clinical and genetic features of patients.However,for T2D,which is highly heterogeneous and has multiple options of drugs with diverse effecicies and adverse effects,making optimal personalized medicaition schemes is very challenging for clincians.To study approaches of achiving T2D personalized mediacaition,we collected clinical information and CYP2C19/CYP3A4 polymorphisms of T2D patients,and predicted mediacaition schemes with machine learning techniques based on these data.The research included three parts.First,we collected and analyzed the clinical data and medication schemes of hospitalized T2D patients to get a data set.Second,we proposed a novel algothim model,trained it and five other models with the clinical dataset,and evaluated their performance,respectively.Third,we collected another data set to further validate the model;we also included CYP2C19/CYP3A4 polymorphisms to predict medication schemes.In the first part,we screened the hospitalized patients in the endocrinology department of a top-tier hospital during 2010-2013 for enrolled T2D patients.According to the suggestions from clinicians,we selected 25 medication-related clinical index for analysis and model building.The used hypoglycemic drugs included metformin,insulin,glucosidase inhibitors,sulfonylureas,glienides,DPP-4 inhibitors,thiazolidinediones,and GLP-1 receptor agonists.We analyzed the accossociation between each drug class and the 25 clinical indexes with Logistic regression,found that different drugs are associated with different clinical indexes.To avoid missing any useful information,we thus included all the 25 cinical indexes into the machine learning model building.In the second part,based on the collected data we built models which input the T2D clinical data to predict hypoglycemic medication schemes.We proposed a novel model called wighted-Rank-SVM(WRank-SVM),and used 80%of the dataset as training set and 10%as test set to train the model.Comparing WRank-SVM with 5 other algorithms,including Rank-SVM,Binary-SVM,ML-KNN,ML-BP and ML-NB,we showed that WRank-SVM has better performance than all other algorithms in predicting hypoglycemic medication schemes.WRank-SVM could achive an Average Precision of 75.52%and F1-Score of 61.94%in the performance validation.In the third part,we further validated the performance of the WRank-SVM model in predicting hypoglycemic medication schemes.We collected the T2D clinical data from another top-tier hospital as the clinical dataset.Meanwhile,we obtained the genotype of key CYP3A4 and CYP2C19 SNPs,and integrated this information with the clinical data to form the CYP3A4 dataset and CYP2C19 dataset.Compared with the performance in the second part,WRank-SVM model trained with the new clinical dataset showed much better performance,with lower Coverage and Hamming loss,as well as an Average Precision of 92%,which proves the feasiblility of the WRank-SVM model in predicting hypoglycemic medication schemes.Moreover,WRank-SVM trainded with the CYP3A4 dataset showed better performance compared with the CYP2C19 dataset,especillay in improving the Average Precision from 89.93%to 97.82%.Besides,the WRank-SVM trained with the CYP3A4 dataset also shows better performance than the clinical dataset,with Average Precision of 97.82%and 92%,respectively.It implied that including the pharmocogenomic information into the training dataset could help to improve the performance of the WRank-SVM model in predicting hypoglycemic medication schemes,and different pharmocogenomic genes have different effects.T2D has caused huge pressure for the global health and economy.The hetergenity of T2D,the diversity of the drugs and their diversified efficiency and toxity,make it very chanllenging for the clinicians and pharmacists to design personalized medication schemes for T2D patients.In this research,we took the avantage of the fast development of machine learning techniques to study the approach of achiving T2D personalzed mediacaition based on the cinical index and genotype data for the first time,and we obtained a well-performanced predicting model successfully.This study has provided techeniques and strategies for achiving T2D personalized medication and precise medication. |