Research On API Documentation Mining Driven By Software Knowledge And Data

Posted on:2022-09-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D Wu

Full Text:PDF

GTID:1488306497489874

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reusing the Application Programming Interface(API)can help developers speed up the software development process and improve software quality.As the scale and number of API libraries increase,developers(even experienced developers)often encounter unfamiliar APIs or new APIs.API documentations,such as API specifications,API tutorials,and online forums,are valuable API learning resources.To better help developers use API documentation,researchers aim to use data mine techniques to find API-related knowledge(e.g.,API names,API usage)from the API documentation.Since API documentation mining is an approach driven by software knowledge and data,the quality of data plays an important role in API documentation mining.Many approaches have been presented to study API documentation mining,but there still exist many practical problems to be studied in API documentation mining from the perspective of data quality.(1)For a tutorial fragment,when generating API tags that semantically related to the tutorial fragment,not all APIs are related to the programming topic described in the tutorial fragment.Some APIs are simply involved in the tutorial fragment,which may affect developers’ understanding of the fragment.Therefore,there is a semantic mismatched problem in the generation of the API tags.(2)For a tutorial fragment,if it explains an API usage knowledge,it is relevant to the API,otherwise,it is irrelevant to the API.However,one API may appear in both relevant and irrelevant fragments.The irrelevant information problem may affect the efficiency of the API learning of developers.(3)Different API documentations often explain API usage from different aspects.Retrieving API usage from different API documentations simultaneously is helpful for developers to learn and understand unfamiliar or new APIs.Nevertheless,how to make full use of both different API documentations has not been well studied.Furthermore,it’s necessary to address the irrelevant information problem when combining different API documentations.Thus,there is an insufficient information problem in the retrieval of the API usage.This thesis studied the three above-mentioned unsolved problems in the existing API documentation mining works,and proposed several models and approaches for addressing them.Some valuable research achievements have been made:(1)For the semantic mismatched problem in the generation of API tags,this thesis proposes a deep neural network-based API tag generation approach,which can generate API tags for tutorial fragments by using SO Q&A pairs(called ATTACK).ATTACK regards the generation of API tags for Q&A pairs and tutorial fragments as a neural translation problem,where the Q&A pairs/fragments are translated to a set of API tags.More specifically,ATTACK first automatically extracts API tags from Q&A pairs.ATTACK then trains a deep neural network with the attention mechanism to learn the semantic relatedness between a Q&A pair and an API tag set,taking into consideration both textual descriptions and code examples in a Q&A pair.At last,a set of API tags for tutorial fragments can be generated by consulting the model.(2)For the irrelevant information problem in the detection of relevant fragments,this thesis proposes a semi-supervised transfer learning based relevant fragments detection model,which can leverage SO posts to discover relevant tutorial fragments of APIs(called SO2RT).We refer to the task that uses the labeled information(relevance between SO Q&A pairs and APIs)and unlabeled information(tutorial fragments and APIs)to identify the relevant fragments of APIs as a semi-supervised transfer learning task.SO2 RT first automatically extracts relevance(relevant and irrelevant)between Q&A pairs and APIs based on heuristic rules.SO2 RT then trains a semi-supervised transfer learning based detection model,which can transfer the API usage knowledge in SO Q&A pairs to tutorial fragments by utilizing the easy-to-extract relevance between Q&A pairs and APIs.Finally,relevant fragments of APIs can be discovered by consulting the trained model.(3)For the insufficient information problem and irrelevant information problem in the retrieval of API usage,this thesis proposes an approach to retrieve API-related knowledge from both tutorials and Stack Overflow(SO)based on natural language questions(PLAN for short).To combine API tutorials and SO,PLAN separately extracts APIs from each tutorial fragment to generate ＜API,fragment＞ pairs,and extract APIs from Q&A pairs to build ＜API,Q&A＞ pairs.The ＜API,KI＞ datasets(each tutorial fragment or Q&A pair is a knowledge item(KI))can be constructed by combining these two types of learning resources.To return a list of ranked ＜API,KI＞pairs to developers,PLAN consists of the three main stages: in the first stage,PLAN maps a natural language question into potential APIs.In the second stage,PLAN designs a transfer deep metric learning based relevance identification(TDML)model for simultaneously identifying relevant ＜API,KI＞ pairs that contain two different resources.In this way,the irrelevant information problem can be solved.In this step,relevant ＜API,KI＞ pairs and potential APIs are selected to generate potential results.In the third stage,PLAN returns API-related knowledge based on the mutual similarity between natural language questions,potential APIs,and potential results.

Keywords/Search Tags:

API documentation, API tutorial, Stack Overflow, deep neural network, semi-supervised learning, transfer learning, transfer deep metric learning

PDF Full Text Request

Related items

1	Unlabeled Data Aided Deep Learning Techniques Researches
2	Research On Deep Learning-Based Representation Learning Algorithms
3	Research And Application Of Image Style Transfer Based On Deep Learning
4	Deep Transfer Learning For Cross-domain Aspect-based Sentiment Analysis
5	Reliable Semi-supervised Learning For Evolving Data Stream
6	Multiple Kernel Learning Improved By Bi-objective Functions And Its Application To Semi-supervised Learning And Transfer Learning
7	Research On Semi-supervised Classification Algorithm Based On Integrated Neural Network
8	Research On Semi-Supervised Deep Learning Methods For Finr-Grained Image Classification
9	Research And Application Of Transfer Learning Methods For Deep Convulutional Neural Networks
10	On The Deep Transfer Learning And Its Applications