Font Size: a A A

Research On Developer Expert Retrieval Technology For Open Source Collaboration

Posted on:2024-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WengFull Text:PDF
GTID:2568307052996359Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of open source software,many code hosting platforms have emerged at home and abroad to help developers manage open source software.A feature of these code hosting platforms is that developers’ experience in open source projects on the platform reflects their proficiency of the relevant expertise.This feature can be applied to many scenarios,such as expert discovery in open source communities,recruitment of enterprise software developers,etc.However,the world’s largest code hosting platform GitHub,only supports searching for developers by user name,and there is little information in the search results that can reflect the developer’s expertise.In response to the above problems,this thesis builds a developer expert retrieval model and system for open source collaboration based on GitHub ecological data,which mainly solves two types of retrieval tasks: Expert Profiling and Expert Finding for open source developers.The former is the labeling of expert expertise,which can enrich the developer’s professional knowledge information in the search results; the latter is the process of matching the most suitable experts by keywords,which can support more diverse search needs.The main contributions of this thesis are as follows:(1)Build a GitHub ecological data collection framework and retrieval basic data:Although users can get open data from GitHub by API,there are some problems during actual use.GitHub places API request restrictions on users and does not provide the whole structure of open data and the complete historical event stream data.This thesis builds an efficient and stable GitHub ecological data collection framework Open Crawler.Based on the open source activity,the collected ecological data is pre-processed,and basic data suitable for open source developer expert retrieval tasks are made.(2)Propose a open source developer expert retrieval model:Based on the idea of”Document Model”,this thesis treats open source projects as documents,and builds an open source developer expert-keyword network through network generation and network integration.Based on these,this thesis propose OSDERM(Open Source Developer Expert Retrieval Model)for open source collaboration which combines the idea of ”Expert Model” and uses network representation learning technology to solve two types of search tasks.In order to learn the feature representation of network nodes,this thesis proposes the network representation learning algorithm OSC2vec(Open Source Collaboration to vector)by combining developer collaboration constraints.After that,link prediction and case study experiments are designed to verify the effectiveness and rationality of the proposed methods and models.(3)Implementation and optimization of the open source developer expert retrieval system:Based on the collected GitHub ecological data and OSDERM,this thesis builds an open source developer expert retrieval system to provide users with efficient and stable services about open source developer expert profiling and expert finding.Aiming at the high computation cost of model vector similarity in the retrieval system,this thesis proposes the LSH-Net method combined with vector LSH algorithm and the developer expert-keyword network,which improves the search efficiency of open source developer experts and reduces the similar search error caused by the vector LSH algorithm.Finally,this thesis designs a large number of experiments from the effect and efficiency,which verify that the proposed method can greatly improve the retrieval efficiency while ensuring the retrieval effect.In summary,this thesis introduces in detail the developer expert retrieval model and system built on GitHub ecological data,and designs a large number of experimental analysis on it.Experimental results show that the research solution proposed in this thesis can effectively solve the two types of retrieval tasks of open source developer expert profiling and expert finding.
Keywords/Search Tags:open source collaboration, developer expert retrieval, network rep-resentation learning, local sensitive hashing
PDF Full Text Request
Related items