| The stock market is a barometer of the national economy, is an important reflection of the national economic development. Therefore, to understand the financial and stock markets may be an effective way to understand national economic development. Financial and stock market, however, is continuously changing. And to understand it is relatively difficult? The main factors which influence the financial and stock market are related state policies, financial news and investors’ mood of the stock market, and so on.Although the underlying factors cannot be easily understood or measured, the factors are buried in related online news. Therefore, it may be reasonable to employ text-related method to research the relationship between these factors and financial Index. From the point of view, Natural Language Processing, This paper aims to find out the correlation between lexical dynamics and the stock market index changes. More specific in this thesis, “Term-Index†correlation is researched. This correlation problem is formalized into two problems: one is classification: to predict the rise and fall of the stock market index; another is regression problem: to predict the possibility of rise and fall of the stock index. The financial text is expressed as a collection of words, vocabulary in the daily financial text constantly updated, this change is called: Term Dynamic Characteristics. Using the dynamic characteristics of term from the text to identify those with the highly index correlated term(highly-index-correlated term HICT), The identification of HICT words is done by the analysis of the stock index information and the frequency distribution method in time series, And taking the weight value of HICT as the feature, we trained the forecast and regression model. Through the above model we predict the stock index rise and fall of the stock market,and we employ regression analysis to process closing index. Finally, the accuracy of the model is calculated and the correlation of the regression results to explore and verify the "Term- Index" correlation.The experiment uses the Adaboost algorithm to train the prediction model, and uses the nearest neighbor regression method to analyze the stock index, and the experimental results of various HICT word feature selection methods are compared, the results show that the proposed feature selection method is the best. In order to improve the efficiency of the model training to reduce the complexity, the principal component analysis method is used to reduce the feature dimension. In the experimental results of the Shanghai Composite Index, the forecast accuracy rate of 72%, the index regression results Pearson correlation coefficient of about 0.5, Therefore, it shows that the use of Natural Language Processing technology to analyze the financial index is feasible and effective, and further shows that the dynamic characteristics of the financial text and the stock market index has a significant positive correlation. Finally, the error analysis of the model and the future research direction are discussed. |