| In order to fully explore the value of information in the field of media converge from various sources,this paper studies and proposes improvements to both question-answering and retrieval algorithms based on the construction of a media converge knowledge graph.Eventually,designs and implements a question-answering and retrieval model.The results of these improvements are then applied to the context of media converge by integrating the question-answering and retrieval models into a single system using intent recognition.The specific tasks and contributions are as follows:(1)Ro BEG model is proposed for the Text2 SQL question-answering field.Addressing the low accuracy issue and the more prominent problem in Chinese context caused by the existing research in the field mainly relying on English string matching,this paper proposes Ro BEG model targeting Chinese language.The model adopts a Chinese pre-training strategy with whole-word masking at the encoder,and a tree-based decoding combined with an execution-guided approach at the decoder,improving the retrieval efficiency.The execution accuracy of the model on the Chinese public dataset Table QA has been significantly improved,which is 7.13% higher than the baseline,and the effectiveness of the model’s encoding and decoding design is further demonstrated through ablation experiments.(2)An unsupervised encoding strategy combining semantic data augmentation and contrastive learning is proposed for the retrieval field.Addressing the problems of sparse annotated Chinese corpora and slow processing speed of traditional Deep Structured Semantic Models for sentence pairs,this paper adopts an unsupervised learning method combining data augmentation and contrastive learning to achieve retrieval.The accuracy of the data enhancement strategy usually focuses token changes,which has the problem of low accuracy.Therefore,this paper proposes a semantic data augmentation strategy.Experimental results on multiple Chinese public datasets and recent research achievements show that the proposed strategy achieves the best Spearman coefficient result on the main reference dataset Chinese-STS-B,with a 1.69% relative improvement over the baseline model and at least 7.93% relative improvement over the supervised learning model.The experimental results on multiple datasets demonstrate that this strategy can effectively improve retrieval efficiency.(3)We constructed a question-answering and retrieval system based on a media converge database.To provide data support for the research,we first constructed a media converge knowledge graph containing 504,014 triples.Then,we integrated the questionanswering and retrieval models into the system using a binary classification model,which using to distinguish question answering and retrieval intentions.We trained the model through data annotation and achieved the application requirements for the model’s effectiveness.Finally,we visualized the system for ease of use.In summary,this paper studies the related technologies of question answering and information retrieval.It proposes the Ro BEG model based on whole-word Masking Chinese pre-training and execution-guided decoding for Text2 SQL question answering and the unsupervised sentence-level encoding strategy of semantic data enhancement combined with contrastive learning for the retrieval domain,contributing to the research of algorithms in the field of Chinese question answering and retrieval.On this basis,a question answering and retrieval system is constructed for the application of the media converge field,meeting the needs of users to mine domain information,which has practical value. |