Font Size: a A A

Research And Implementation Of Enterprise Document Knowledge Search System Based On Deep Learning

Posted on:2024-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:C SiFull Text:PDF
GTID:2568306944970449Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Before the vigorous development of modern computer technology,paper documents were an important medium for people to transmit information and carry intelligence.Now,with the rapid development of information technology,electronic documents have become an important medium for people to communicate and collaborate because they are more convenient to transmit and store,and are also important intellectual assets within enterprises.With the acceleration of digital transformation,the exponential growth of the number of documents,and the high frequency of electronic documents,how to quickly and accurately locate the target file from massive documents and provide users with a high-quality document search experience has become a problem that we need to pay attention to.On the basis of researching existing document search products,this paper proposes a novel enterprise document search framework-QianXun.Based on this framework,the enterprise document knowledge search system is implemented,and the automatic summary model is developed for Chinese long text.In terms of system functions,it mainly focuses on document management and document search.In order to provide good user search experience,the system provides full-text retrieval function,supports online preview of documents in various common formats,and provides knowledge graph enhanced search function to assist users to associate information.At the same time,the key information of the uploaded document is extracted to generate a short summary,and return them to users together with key information such as file names on the search result page,helping users quickly locate target files.The automatic summarization model is constructed by combining extraction and generation.First uses the RoBERTa pre-training model for vectorization,and uses global average pooling for dimensionality reduction processing,and then inputs the attention layer to learn the relationship between sentence vector and fine-tune the sentence vectors.Secondly,the sentence vector was input into the abstract extraction model with DGCNN as the main body,and the key statements were marked for the next step of training.Finally,the T5-PEGASUS pre-training model is used to polish the extracted statements and get the final text summary.The enterprise document knowledge search system designed and implemented in this paper provides complete document management and document search functions for users with different permission levels,and designs many special auxiliary functions for document search to improve the user’s document search experience and efficiency.The experimental results show that the Chinese long text summary model proposed in this paper has better effect,with the score of Rouge-1 reaching 70.26,an increase of 4.02%,and the score of Rouge-L reaching 67.15,an increase of 2.15%.Moreover,the improved vectorization method in this paper not only improves the accuracy and precision of abstract extraction slightly,but also increases the recall rate by three percentage points.
Keywords/Search Tags:chinese long-text summarization, automatic summarization technology, knowledge graph, document retrieval
PDF Full Text Request
Related items