| With the rapid development and popularization of the Internet,the amount of text data on the web has grown rapidly,resulting in a large number of various text information such as news,comments,and papers,which increases the difficulty of text processing and information extraction from the text.Automatic text summarization technology can transform a long text into a short one containing the main content of the source text,which not only helps people efficiently acquire text information but also alleviates their reading pressure and improves the utilization of information.In order to provide users with more accurate summary information,this paper studies Chinese automatic text summarization technology from three aspects: feature extraction,feature selection,and data augmentation,significantly improving the accuracy and readability of the generated summaries.The research on Chinese automatic text summarization models is as follows.To address the issues of insufficient semantic understanding and repeated summary words in traditional pointer-generator network models for automatic text summarization tasks,this paper conducts in-depth research on text feature extraction and feature selection and proposes a Chinese automatic text summarization model based on PACC(Pointer generator networks-Attention-Convolutional gated unitCoverage mechanism).The model first introduces a self-attention mechanism for feature extraction from the source text to obtain more comprehensive contextual semantic features.Next,it introduces convolutional gated units to extract and select contextual semantic features.Finally,it introduces a coverage mechanism to reduce the re-attention of the same words,solving the problem of word repetition in generated sequences.Experimental results on the LCSTS text summarization dataset show that the PACC model achieves a higher ROUGE score compared to the pointer-generator network,effectively improving the model’s performance and generating high-quality summaries.Ablation experiments show that the self-attention mechanism and convolutional gated unit both have a positive impact on the model,and the best results are obtained when they are used together.The PACC model incorporates LSTM,which has issues with long-term dependencies and parallel computing difficulties.These issues affect the accuracy of generated summaries and prolong the model’s training time.The Transformer model solves these problems,and this paper improves upon the Transformer model to enhance the model’s ability to extract information from Chinese text and its noise resistance.We propose a Chinese automatic text summarization model based on CTCP(Convolutional neural network-Transformer-Contrast learning-Pointer mechanism).The model first uses different data augmentation methods to expand the data,then processes the expanded data through a convolutional layer to extract 2-gram and 4-gram information from the Chinese text.The Transformer encoder is used to obtain Chinese text features containing global information.Afterward,the method of selfsupervised contrast learning is used to enhance the model’s resistance to noise interference,thereby improving the quality of context semantic features.Finally,the pointer mechanism is introduced in the decoder to solve the out-of-vocabulary problem in generated summaries,improving the quality of the summaries.Experimental results on the LCSTS text summarization dataset show that the CTCP model achieves higher ROUGE scores.Ablation experiments show that using other data augmentation methods,except for document rotation,yields consistent results.Document rotation degrades the model’s performance by destroying the document structure.The experiments also verify that the convolutional layer and contrast learning module have a positive impact on the model,resulting in higher summary accuracy. |