| With the booming development of software and information technology industries,open source community plays an important role in prospering domestic open source ecosystem and empowering the construction of Digital China.These open source communities provide not only software tagging management tools to software developers,but also open source software retrieval services to community users.As the number of open source software in the open source community increases rapidly,so does the number of unlabeled open source software,which makes it difficult for users to search and locate appropriate open source software resources.In order to address the above problems,this thesis proposes an open source software tag recommendation method based on multi-source feature fusion and an open source software retrieval method that integrates software multi-source features and tag features.A GitHub-oriented open source software retrieval system is implemented to help community users accurately to retrieve the target software.The main contributions of this thesis are as follows:(1)Aiming at the problem of low accuracy of tag recommendation caused by insufficient mining of software features,this thesis proposes a multi-source feature fusion tag recommendation method MF-TagRec.Using the word embedding model,MF-TagRec learns the topic and global semantic information of the open source software description documents,and learns the embedding representation such as programming languages and dependency package tags.Based on the fusion of the above multisource features of open source software,a convolutional neural network which can mine the hidden relationship between software tags and multisource features is constructed to realize the automatic tag recommendation for open source software.Experiments on a real GitHub dataset show that MF-TagRec outperforms DepTagRec,TagRNN,Fast TagRec,GRU and BiGRU method in recommendation performance,with an average improvement of 27% on Recall@5.(2)Aiming at the problem of poor performance of open source retrieval caused by insufficient use of tag data,this thesis proposes an open source software retrieval method that integrates software features and tag features.The semantic extension of the retrieval request is performed by using the domain-specific knowledge of open source software and the associated community data to enrich and enhance the feature information of the retrieval request;The word vector model is used to map multiple features of software into the same semantic space,and then calculate the correlation between the retrieval request and the software to obtain the retrieval result.On the basis of the above correlation calculation,this thesis sorts the above retrieval result according to the two influential features of open source software’s attention and collaboration scale.Experiments show that the retrieval results returned by the retrieval method proposed in this thesis have a higher correlation degree with retrieval requests,which improves the satisfaction of community users in the open source software retrieval process.(3)In order to achieve fast and accurate retrieval of software resources,this thesis proposes an open source software retrieval method based on the fusion of multi-source features and tag features,and then designs and implements a GitHub-oriented open source software retrieval system.The system mainly includes open source software data acquisition,software retrieval,software tag recommendation and other functional modules.The application results show that the GitHub-oriented software retrieval system provides accurate and efficient software retrieval services for community users. |