| Objective: The purpose of this study was to transform the original ECG data into highfidelity and high-throughput characteristic data,which is convenient for data mining and screening to improve the quantitative analysis ability of ECG.In this thesis,Shenyang,a typical city in northern China,was selected as the study area to investigate the multi-dimensional features of ECG signals,verify the validity of the features and conduct interpretation analysis.Methods: ECG database was built via through SQL Server.And then,a total of 9categories of 455 features were constructed,including basic statistical features,timedomain features,signal factor features,high-order zero-crossing analysis,frequency domain features,spatial domain features,nonlinear features,adjacent potential cooccurrence matrix features,and macro features.Next,correlation analysis and Lasso regression were used for feature selection.At last,support vector machine(SVM),random forests,K nearest neighbor classifier,Adaboost,logistic regression are utilized as a classifie.The experiments were designed to classify 7 kinds of cardiovascular diseases:(1)ST-T change,(2)right bundle branch block,(3)atrial premature beats,(4)ventricular premature beats,(5)atrioventricular block,(6)left ventricular hypertrophy,and(7)myocardial infarction.Results: The database constructed in this study included 22 data tables,which divided cardiovascular diseases into 12 categories and 491 subcategories for subsequent feature extraction of ECG signals.The results of feature selection showed that in(1)patients’ age,maximum AVL potential and other features had high performance,and random forest classification had the best effect,with the accuracy up to 93.9%.(2)V1 lead pulse factor,V1 lead kurtosis and other features of high performance,the use of random forest classification effect is the best,the accuracy of 98.6%.(3)Lead elongation of I and age of patients were high,and random forest classification had the best effect,with the accuracy up to 99.2%.(4)Lead elongation of I,lead length of AVL and other features were high.Logical regression classification was the best,and the accuracy could reach98.9%.(5)V1 lead waveform factor,III lead energy and other features of high performance,using logistic regression classification effect is the best,the accuracy can reach 96.2%.(6)V4 lead entropy,V4 lead kurtosis factor and other features have high performance.The classification effect of random forest is the best,and the accuracy can reach 99.4%.(7)III lead energy,V3 lead equilibrium and other characteristics of high performance,the use of random forest classification effect is the best,the accuracy can reach 93.9%.Conclusions: In this study,multi-dimensional ECG features were extracted and macro features were introduced into machine learning.After screening and verification,the classification performed pretty good and the validity of the features was verified.Moreover,interpretative analysis of medical significance and statistical significance was carried out for the features.The results of ECG automatic diagnosis in this thesis provide reference for the follow-up research. |