Font Size: a A A

The Design And Implementation Of Bug Localization System Based On Enhanced Semantic Retrieval

Posted on:2022-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:E M LiFull Text:PDF
GTID:2518306725984199Subject:Master of Engineering (field of software engineering)
Abstract/Summary:PDF Full Text Request
Static bug localization techniques automatically locate the potential buggy source code for the given bug report by historical bug reports,source code,and other artifacts generated within the process of software development.For static method-level bug localization techniques,a key challenge is to fully retrieve the functional semantics of methods and problem semantics of a bug report.However,there are some problems in semantics retrieval on the existing static bug localization techniques.On the one hand,the information retrieval techniques used by previous techniques usually fail to deal with the context of lexical terms,which leads to insufficient semantic representations of bug reports in terms of their textual content.On the other hand,existing studies mainly used traditional information retrieval techniques to retrieve the semantics of source code.Taking method code as pure text would miss the structure information of a method that contains program functionality.This system introduces a static method-level bug localization technique based on enhanced semantic retrieval to retrieve the semantics of bug reports and method code in different feature spaces,respectively.We first do some preprocessing work on bug reports and apply a word embedding technique in the field of NLP to retrieve semantics of bug reports.Secondly,We use an AST-based code embedding technique to retrieve the semantics of methods.Then,we use a neural network to unify the two kinds of semantic representations and train a model for predicting buggy methods based on a data set.Finally,The model could be used to predict potential buggy methods for a newcoming bug report.The system is divided into four modules,including data set construction module,bug report retrieval module,method code retrieval module,and bug localization module.We compare five typical word embedding models in representing bug reports and try to explore the usefulness of resampling strategies and cost-sensitive strategies in handling class imbalance problem.We evaluate the effectiveness of our system on five Java projects from the Defects4 J data set.The results show that: on the whole,the word embedding model ELMo outperformed the other four models(word2vec,fast Text,Glo Ve,BERT)in facilitating bug localization techniques.Among five strategies aiming at solving class imbalance problems,the random over-sampling strategy performed much better than the others(including random under-sampling,Focal Loss,etc.).In terms of MAP,MRR,and Hit@1,this system achieves much better results than state-of-the-art baseline MULAB,with an absolute increase of 6.2%-50.2%,6.8%-50.5%,13.3%-50.0%,respectively.This system achieves much better results than state-of-the-art baseline Fine Locator,with an absolute increase of 6.8%-48.4%,5.6%-45.8%,6.3%-45.7%,respectively.In summary,the system can effectively help locate buggy methods in Java programs and promote software maintenance.
Keywords/Search Tags:bug localization, word embedding, semantics retrieval, class imbalance
PDF Full Text Request
Related items