Font Size: a A A

Research On Related Questions Retrieval Model For Stack Overflow

Posted on:2023-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z S QinFull Text:PDF
GTID:2568307103994539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As one of the most popular developer forums,Stack Overflow is an important knowledge sharing and discussing platform for programmers.Since the foundation of the site,the forum has accumulated a mount number of high-quality historical questions and answers,where many technical questions are semantic related to each other.When developers ask a question,the existing answers of the related questions in the forum can provide developers with targeted knowledge to solve their queries,and save the time consumed by waiting.In addition,reducing the number of duplicate questions can also beneficial to the forum maintenance.In the study of related question retrieval in the field of software engineering,the feature-based methods cannot overcome the challenge of lexical gaps and cannot fully extract the deep semantic information between query and candidate question.The deep learning methods predict the semantic relevance by the single modeling vector representations between query and candidate question,which may loss the semantic interactive information.In addition,the existing methods are not sufficient in global semantic feature extraction and mostly uses one type of neural network as the encoder.To address the above problems,this paper proposes a Related Question Retrieval Model with Integral Fusion(RQRM-IF)and a Related Question Retrieval Model with Multiview Encoding(RQRM-ME)for Stack Overflow forum.The RQRM-IF mainly includes the text encoding framework,local semantic feature extraction framework and global semantic feature extraction framework.In the text encoding framework,this paper trains word embeddings specific to the software engineering domain based on Stack Overflow data dump,which can represent software engineering terms more completely and accurately,and extracts long-term dependency information of text using bidirectional LSTM.In the local semantic feature extraction framework,this paper extracts the semantic interactive information between the query and the candidate question through the attention mechanism,then deeply fuses the interactive information with the origin features.In the global semantic feature extraction framework,this paper proposes an integral fusion module to enhance the extraction of global semantic features that is complemented with local features,improving the retrieval performance.The RQRM-ME mainly consists of a text encoding framework,an attention branch and a convolutional branch,where the attention branch and the convolutional branch extract features on the output vector of the text encoding framework in parallel.In the attention branch,this paper uses an improved attention mechanism and multi-ways fusion to capture the local semantic features.In the convolutional branch,this paper uses a multi-level convolutional structure as an encoder.Combined with the contextual feature vectors outputted from the text coding framework,the multi-level convolutional structure utilizes different granularity of convolutional operations to extract the phrase and word patterns of the text.Finally,the model unites the matching features of attention branch and convolution branch for semantic similarity prediction.The RQRM-IF and the RQRM-ME perform well on the Stack Overflow related questions retrieval dataset Link SO,where the RQRM-ME achieve best performance of MRR,n DCG@5and n DCG@10 with 57.3%,58.3% and 63.1%,respectively.The experimental results show that,our proposed models provide theoretical and algorithmic support for subsequent tasks of related question retrieval in the field of software engineering.
Keywords/Search Tags:Natural language processing, Related question retrieval, Semantic matching, Word embedding
PDF Full Text Request
Related items