Research On Digging And Application Of Semantically Related Words In Software | | Posted on:2018-08-18 | Degree:Master | Type:Thesis | | Country:China | Candidate:W S Hu | Full Text:PDF | | GTID:2518305966450384 | Subject:Software engineering | | Abstract/Summary: | PDF Full Text Request | | Code search is a common task for software development and maintenance.Developers often need to search code for programming tasks such as code study,code reuse and bug localization.Existing code search tools are usually based on keywords-text matching.The same as traditional information retrieval,the inherent difficulty of keyword based code search is vocabulary mismatch problem between user query and retrieved code.To improve the accuracy of code search,utilizing semantically related words for query expansion is needed.It is limited to rely on natural language resources such as English dictionary and Word Net to expand code search query because the semantics of words in software differ badly from words in English.A number of techniques have been proposed to identify semantically related words in software,while most of them measure the similarity of words simply by text similarity comparison or statistics of word co-occurrence,the limitation is huge.This paper designs a Word Embedding based method to learn semantically related words in software,and studies its application on code search.SWordMap obtains semantically related words for 19332 words in software through training the neural network language model CBOW on Stack Overflow documents.To study the application of obtained semantically related words on code search,this paper designs two query expansion models for local code search and large scale open-source code search,and implements them based on search engine Elasticsearch.This paper designs four experiments to evaluate SWordMap: the precision of the semantically related words obtained by SWordMap;the improvement on concern location by utilizing SWordMap;the improvement on local code search by utilizing SWordMap;the improvement on opensource code search by utilizing SWordMap.The experiment results show that SWordMap can effectively identify semantically related words in software,improve the concern location performance and local code search accuracy,but has a limited improvement on open-source code search.The results of comparable experiment with previous work show that SWordMap can identify more accurate semantically related words in software and help improve concern location and local code search significantly. | | Keywords/Search Tags: | code search, query expansion, semantically related words, SWordMap | PDF Full Text Request | Related items |
| |
|