| Social media,online transactions or specific organizations are generating large amounts of data every moment.The Tik Tok incident led by the United States has made people aware of the huge wealth behind the explosive growth of data.How to efficiently and quickly obtain the key information in the massive data benefits from the important technical means of keyword extraction.Whether in the field of natural language processing or personalized recommendation,keyword extraction is an irreplaceable role,so it becomes very meaningful to develop an accurate and fast keyword extraction algorithm.Most keyword extraction algorithms are based on the statistical information of the words in the article,such as word frequency statistics,etc.,and the subject information of the text is lost.The keywords extracted in this way cannot fully reflect the subject and core content of the document,and the accuracy rate is not high.The earth restricts the effectiveness of keyword information.In response to the above problems,in order to further improve the effect of the keyword extraction algorithm,this paper considers adding the topic information of the words,and proposes a neural topic model(NTM)-based keyword extraction algorithm.The main tasks are as follows:(1)Different from traditional machine learning algorithms,this article takes keyword extraction as a serialized labeling problem,and uses a deep learning model to determine the beginning,middle,and end positions of keywords in a sentence.Taking the pre-trained network model BERT as the basic network,by constructing the BERT-CRF network model,learning the state characteristics of the text sequence,and then obtaining a state score,which is directly input to the CRF layer to obtain the keyword labeling.(2)The main structure of the article and the topic distribution of keywords are important factors that affect the accuracy of extraction.In order to obtain this information,a network model NTM(Neural Topic Model)based on the Variational Autoencoder(VAE)is constructed,using the article The term frequency matrix generates a hidden variable topic matrix.During the construction process,many different network structures were tried,not only the basic network model including Encoder,Decoder and Generator,but also the model parameters were optimized on this frame structure to complete the compression of the NTM model.In addition,techniques such as heavy parameters are also used in the model to optimize model training.(3)Complete the fusion of the BERT-CRF model and the NTM model,perform Attention operations on the hidden variables output by the NTM and the word vectors output by the BERT,introduce topic information for each word vector,and try to solve the OOV(Out of Vocabulary)problem at the same time,Improve the effect of the algorithm in keyword extraction at the semantic level.(4)Based on the keyword extraction model of the paper design and research,a user portrait construction system based on the data of a certain model forum of a southern car company was designed.From data collection to user feature analysis,the proportion of NVH content of different models in the reviews was discussed and the keywords of related reviews were counted.Finally,through keyword distribution and sentiment analysis,the user’s subjective attitude towards the performance of the target vehicle’s NVH is obtained. |