| With the development of medical technology, medical data has been rapidly accumulating. The demand from the clinical diagnosis, teaching and research have gradually made the medical data analysis technology become the focus of clinical medical research.Clinical data mainly include cross-sectional data, time series data and longitudinal data. Cross section data is the data obtained in the same survey, which can be used to analyze the impact factors of the disease in medicine. Data collected at different time points and used to reflect the state or degree of a phenomenon and a transaction with time is called time series data. Different from the above two kinds of data, in clinic, the data collected from the patients by follow-up is a typical longitudinal data. Longitudinal data is combination of the cross section data and time series data. Through the analysis of longitudinal data, we can not only analyze the trend of individual characteristics over time, but also analyze the differences between individuals, which is of great importance in medical science.In this paper, after analyzing the characteristics of clinical longitudinal data, we proposed improved clustering method and feature selection method, which are suitable for clinical longitudinal data. And then, we improved the hierarchical linear model to analyze the real clinical longitudinal data, which showed high effective. Finally, we summarized our research. And made a prospect. The main contents of this paper are as follows:(1) According to the doctor’s long-term clinical experience, some patients will have similar trends or the same symptoms. Clustering the patients according to the medical data can provide the doctor some diagnosis and treatment recommendations. In clinical medicine, all follow-up data of patients constitute a set of longitudinal data. Aimed at the multidimensional characteristics of the longitudinal data of clinical medicine, we applied the similarity measure algorithm based on Extended Frobenius distance and the improved K-means, which is less affected by the initial value, on the clustering analysis of longitudinal data. Then, we conducted the compared experiments on the analysis of longitudinal data from non-small cell lung cancer and gestational hypertension, respectively. Experimental results showed that this method can effectively clustering the longitudinal data and the improved algorithm had better effect. Our clustering algorithm in this paper is effective and feasible for longitudinal data, which means our method suitable for the analysis purposing to do cluster on the longitudinal data.(2) In the analysis of clinical longitudinal data, the high dimensions will increase the modeling difficulties. So in the actual modeling, we need to choose the features with relatively large impact on the disease as the input of our model. In this paper, a variable selection algorithm based on GMDH algorithm is proposed and firstly used on the clinical longitudinal data. Via the application on the non-small cell lung cancer data, our method is proved to be suitable to the variable selection of clinical longitudinal data.(3) In view of the characteristics of the longitudinal data, we firstly analyzed the advantages and limitations of hierarchical linear models used in processing clinical longitudinal data. And through analysis of the characteristics of medical longitudinal data, we proposed the method that firstly clustered the time-varying variables in longitudinal data, and then the performed the hierarchical linear model analysis. Then, by using the method on the longitudinal data of non- small cell lung cancer(NSCLC) and pregnancy hypertension, we obtained the change trend of the disease, as well as the individual differences between each other, which is of some certain medical significance and can also provide some advice for doctors. Via the comprehensive experimental analysis, we concluded the applicability and feasibility of hierarchical linear model in medical longitudinal data. |