| Information retrieval techniques and natural language processing techniques are the most widely used methods in resolving software engineering problems.Among these techniques,a critical operation is stopwords removal which aims to remove some frequently occurred words conveying less semantics information by leveraging a stoplist.Therefore,stoplist plays an important role in determining the performance of these techniques in resolving software engineering problems.The current generic English stoplist widely used in the field of software engineering is outdated and has no domain relevance,the new stoplist based on task summary is usually subjective and imperfect.To the best of our knowledge,there is currently no generic stoplist in the field of software engineering.In order to fill this gap and solve the above problems,this thesis attempts to generate the generic stoplist for software engineering and conduct detailed analysis for it.The main contributions of this thesis are as follows:(1)The application status of stoplist in software engineering is summarized.This thesis systematically searches and analyzes the related studies from some top journals and conferences,and classifies the stoplist that have appeared in the history of software engineering.(2)This thesis first generates a generic stoplist for software engineering.Based on the crowd intelligence data from Stack Overflow,this thesis first analyzes and preprocesses the corpus,and then generate the stoplist for software engineering by leveraging the word frequency and document frequency distribution algorithm.Meanwhile,this thesis analyzes the differences between the corpus used in this thesis and the generic English corpus,and the content of the stoplist.(3)Based on the unsupervised bug report summary task,the application effect of stoplist was analyzed.In the task,the software engineering stoplist generated in this thesis and generic English stoplist were used to preprocess the data set.The generic English stoplist is adopted as the benchmark for comparison.The same evaluation criteria,experimental platforms and experiments steps were adopted to rule out the influence of other factors.The analysis results show that the software engineering stoplist generated in this thesis have the best application effect.(4)Based on the unsupervised recommendation API related tutorial segment task,the application effect of stoplist was analyzed.In the task,the software engineering stoplist generated in this thesis and generic English stoplist were used to preprocess the data set.The generic English stoplist is adopted as the benchmark for comparison.The same evaluation criteria,experimental platforms and experiments steps were adopted to rule out the influence of other factors.The analysis results show that the software engineering stoplist generated in this thesis have the best application effect. |