| In the era of big data,information on the Internet has exploded,and people are more searching for knowledge and browsing news on the Internet.Therefore,it is a common demand for people to obtain main information quickly and efficiently.The abstract is a refined summary of an article,which not only reflects the subject of the article but also greatly reduces the cost of obtaining the main information.With the development of computer technology,the use of computers to automatically obtain text summary information becomes a reality.In the field of natural language processing,continuously improving the accuracy of automatic summarization has become an important research direction.In this thesis,an in-depth study of extractable automatic summarization based on machine learning methods is conducted.In terms of feature extraction of text information,first summarizes the text features based on statistics and rules;secondly integrates Chinese linguistic features,such as:part-of-speech features,dependent syntactic features,semantic role features,and semantic dependent features;and finally introduces depth-based Word2vec word vector features for learning.The sentences in the text are converted into 347-dimensional feature vectors as input to the machine learning model.On the basis of considering the form of artificial summarization of the data set,using these rich features of text information,six classic regression algorithm models are used to automatically extract the text information.Compared with traditional methods,machine learning methods with rich feature sets improve the performance of automatic summarization.On this basis,the abstracts of current affairs news were automatically extracted using the model with excellent performance,and good results were obtained. |