| Natural language code search plays an important role in software development.It allows programmers to query in natural language and obtain code fragments from the Internet or source code libraries.However,there are often difficulties in the selection of existing code search methods and new technology research,because(1)the implementation of existing code search methods and the data set to evaluate them are usually not public,and it is impossible to choose a code search method suitable for the current scenario.(2)Some code search methods may use training data sets or auxiliary data sources,so without these data sets / data sources,it is impossible to truly implement the search algorithm and prove its effectiveness.(3)No empirical study has been found that extensively evaluates the effectiveness of various code search methods.(4)There is still room for improvement in current code search methods.In this context,this paper constructs a benchmark data set,reproduces the existing representative methods,conducts empirical research,and proposes a new code search method based on the results of the empirical research.Specific research work includes:1.Baseline data set construction.This article designs and builds a code search evaluation data set CosBench.The data set consists of 1000 items,52 code-independent queries,and 4 indicators for evaluating code search methods.2.Reproduction of existing representation methods.This paper chooses six representative code search schemes,including four methods based on information retrieval and two methods based on deep learning,and reproduces them.3.Empirical research.This article evaluates six code search methods on CosBench.The results clearly show the availability of the CosBench dataset and the advantages and disadvantages of each code search method.In addition,this article also finds that the code search method based on deep learning is more suitable for code query for reused code,and the code search method based on information retrieval is more suitable for error correction and code query for learning API.4.New method research.Based on the observation of empirical research,this paper studies and designs a hybrid code search method IntentCS based on query intent understanding.This method understands the query intent of the input query,and uses a combination of information retrieval and deep learning code search algorithms.Compared to the benchmark method,IntentCS has the best search performance. |