| With a boom of the Internet and on-line social platforms, the scale of Internet users is growing, making online payment and e-commerce really gain momentum. Meanwhile, nevertheless, auto industry has continued to slump, many car companies hope to use the momentum of the Internet to slow down this trend. Therefore, our thesis is to analyze the behavior of automobile users. Getting the main public opinion orientation through the analysis of the commentaries made by the automobile consumers in the social networks or the automobile portal website.We can monitor the development trend of China’s auto industry, and make adjustments accordingly, so as to promote the development of the automotive industry.The main work of this thesis includes the following aspects: first, the statistical analysis of user behavior data. User behavior data includes comment text data and forum data. I have made ststitics of car sales according to time and the basic info of car users.Second, the data pretreatment, filter out some of the topics of this study is not used in the text. And process the text and extract keyword.Third, the use of machine learning of the three models, including the classification algorithm Naive Bayes, SVM, a clustering algorithm kmeans to achieve the direction of public opinion on the forum to monitor.The training data is carried out after the text keyword extraction on word word vector, and then leads to the extraction of keywords in the lexicon to join the automotive industry professional with words, and words will certainly add negative words of speech inversion, segmentation effect is greatly improved. By comparing these three kinds of machine learning algorithms, and finally weighing the pros and cons of various algorithms,it is decided to use the naive Bayesian algorithm to monitor the public opinion of the forum, and show the results.Finally, the analysis of algorithm LDA reputation and Brand Forum extraction using semantic keywords which is according to the brand, we can know the quality of public opinion, and the results are displayed. |