| Technical Debt(TD)is a metaphor that describes the situation where developers exchange long-term benefits(such as software maintainability)for short-term goals in the software development process.Some developers have deployed non-optimal solutions on the urgently needed requirements,and code comments marked as technical debts are called Self-Admitted Technical Debts(SATD).It may lead to serious vulnerabilities even software crashes if not repair SATD in time.Identifying SATD from a large project code comments is a costly task,and previous research methods still have improvements.Firstly,it is difficult to distinguish the importance of features when extracting text features.Secondly,the label information is not considered in the process of SATD type recognition.In view of this,this thesis improves on these issues based on previous research.On the one hand,the methods of feature extraction and weight allocation improve the accuracy in the SATD recognition task.On the other hand,introducing label information into SATD specific category recognition tasks enhances the model’s multi classification ability.The specific research content of this thesis is:1.SATD recognition that integrates Gated Recurrent Units(GRU)with attention mechanism.Previous work regarded SATD recognition as a text binary classification task,and used code comments to distinguish between SATD and non-SATD.Most existing methods identify SATD use the BOW(Bag of Words)model combined with machine learning methods.However,these methods not only lose the semantic information of the text,but also fail to distinguish the importance of features.Therefore,this thesis integrates Bidirectional GRU with attention mechanism which effectively improving the SATD feature extraction process.Firstly,preprocess the source code comments and use word embedding to extract word vectors containing semantics;Secondly,input embedding into the bidirectional GRU layer to extract high-level word features;Once again,input word features into the attention mechanism to generate feature weights,perform weighted summation,and obtain sentence features;Finally,the training model is used to identify code comments.The experimental results show that this method has superiority in identifying SATD and controllable time complexity.2.SATD type classification based on label embedding.There are many types of SATD,which can only be identified from code comments and cannot determine the specific repayment method.Manual classification is also required,which indirectly increases the repayment cost.Therefore,classify the specific types of SATD is more suitable for practical needs.However,there is no clear distinction between the writing methods of different types of SATD in code comments,making type classification more difficult.In addition,there are few existing SATD type classification methods,and performance still needs to be improved.Therefore,this thesis improves on the previous work by adding label embedding and label obfuscation work to enhance the model’s classification ability for specific types of SATD.First,construct a SATD basic classifier with multiple classification capabilities,which has a structure similar to the SATD recognizer in our previous work;Secondly,establish a label embedder to create vectors for each SATA type label;Once again,confuse label vectors and learn the connections between labels;Finally,combining the basic classifier with the label embedder,predict the specific category of SATD.The experimental results show that the performance of our method improves the accuracy of SATD type classification,which helps developers to maintain software.Finally,we design a lightweight SATD detection tool for software developers to quickly locate SATDs.This tool is built using the Django as full stack framework and Sqlite3 database as the data storage tool.We combine the results of the above two studies as an online SATD classification system.This tool has the function of SATD classification.It will continuously iterate online models as the stored data increases,resulting in a high degree of automation. |