Font Size: a A A

Research On API Documentation Mining Driven By Software Knowledge And Data

Posted on:2022-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WuFull Text:PDF
GTID:1488306497489874Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reusing the Application Programming Interface(API)can help developers speed up the software development process and improve software quality.As the scale and number of API libraries increase,developers(even experienced developers)often encounter unfamiliar APIs or new APIs.API documentations,such as API specifications,API tutorials,and online forums,are valuable API learning resources.To better help developers use API documentation,researchers aim to use data mine techniques to find API-related knowledge(e.g.,API names,API usage)from the API documentation.Since API documentation mining is an approach driven by software knowledge and data,the quality of data plays an important role in API documentation mining.Many approaches have been presented to study API documentation mining,but there still exist many practical problems to be studied in API documentation mining from the perspective of data quality.(1)For a tutorial fragment,when generating API tags that semantically related to the tutorial fragment,not all APIs are related to the programming topic described in the tutorial fragment.Some APIs are simply involved in the tutorial fragment,which may affect developers’ understanding of the fragment.Therefore,there is a semantic mismatched problem in the generation of the API tags.(2)For a tutorial fragment,if it explains an API usage knowledge,it is relevant to the API,otherwise,it is irrelevant to the API.However,one API may appear in both relevant and irrelevant fragments.The irrelevant information problem may affect the efficiency of the API learning of developers.(3)Different API documentations often explain API usage from different aspects.Retrieving API usage from different API documentations simultaneously is helpful for developers to learn and understand unfamiliar or new APIs.Nevertheless,how to make full use of both different API documentations has not been well studied.Furthermore,it’s necessary to address the irrelevant information problem when combining different API documentations.Thus,there is an insufficient information problem in the retrieval of the API usage.This thesis studied the three above-mentioned unsolved problems in the existing API documentation mining works,and proposed several models and approaches for addressing them.Some valuable research achievements have been made:(1)For the semantic mismatched problem in the generation of API tags,this thesis proposes a deep neural network-based API tag generation approach,which can generate API tags for tutorial fragments by using SO Q&A pairs(called ATTACK).ATTACK regards the generation of API tags for Q&A pairs and tutorial fragments as a neural translation problem,where the Q&A pairs/fragments are translated to a set of API tags.More specifically,ATTACK first automatically extracts API tags from Q&A pairs.ATTACK then trains a deep neural network with the attention mechanism to learn the semantic relatedness between a Q&A pair and an API tag set,taking into consideration both textual descriptions and code examples in a Q&A pair.At last,a set of API tags for tutorial fragments can be generated by consulting the model.(2)For the irrelevant information problem in the detection of relevant fragments,this thesis proposes a semi-supervised transfer learning based relevant fragments detection model,which can leverage SO posts to discover relevant tutorial fragments of APIs(called SO2RT).We refer to the task that uses the labeled information(relevance between SO Q&A pairs and APIs)and unlabeled information(tutorial fragments and APIs)to identify the relevant fragments of APIs as a semi-supervised transfer learning task.SO2 RT first automatically extracts relevance(relevant and irrelevant)between Q&A pairs and APIs based on heuristic rules.SO2 RT then trains a semi-supervised transfer learning based detection model,which can transfer the API usage knowledge in SO Q&A pairs to tutorial fragments by utilizing the easy-to-extract relevance between Q&A pairs and APIs.Finally,relevant fragments of APIs can be discovered by consulting the trained model.(3)For the insufficient information problem and irrelevant information problem in the retrieval of API usage,this thesis proposes an approach to retrieve API-related knowledge from both tutorials and Stack Overflow(SO)based on natural language questions(PLAN for short).To combine API tutorials and SO,PLAN separately extracts APIs from each tutorial fragment to generate <API,fragment> pairs,and extract APIs from Q&A pairs to build <API,Q&A> pairs.The <API,KI> datasets(each tutorial fragment or Q&A pair is a knowledge item(KI))can be constructed by combining these two types of learning resources.To return a list of ranked <API,KI>pairs to developers,PLAN consists of the three main stages: in the first stage,PLAN maps a natural language question into potential APIs.In the second stage,PLAN designs a transfer deep metric learning based relevance identification(TDML)model for simultaneously identifying relevant <API,KI> pairs that contain two different resources.In this way,the irrelevant information problem can be solved.In this step,relevant <API,KI> pairs and potential APIs are selected to generate potential results.In the third stage,PLAN returns API-related knowledge based on the mutual similarity between natural language questions,potential APIs,and potential results.
Keywords/Search Tags:API documentation, API tutorial, Stack Overflow, deep neural network, semi-supervised learning, transfer learning, transfer deep metric learning
PDF Full Text Request
Related items