| With the rapid development of research in artificial intelligence(AI),it has become a common pattern for AI researchers and developers to publish papers in publications and open code in open source communities.In the process of their research,they often need to retrieve a large number of previous research results.However,the massive unstructured knowledge resources bring challenges to information retrieval.Traditional vertical search engines such as the SCIE academic search engine and the Git Hub open source community search engine support full-text search,but the search results based on text matching is difficult to match the the description of needs input by users based on their domain knowledge,and there is a problem of limited knowledge resources in a single community.In recent years,paper communities dedicated to the abstraction and aggregation of paper results have emerged,such as the Papers With Code website,which has noticed the importance of open source resources for AI research and provided links to open source repositories for papers.However,there are still two challenges for knowledge resource managers: First,knowledge resources across communities are decentralized,and the organization of knowledge resources within a single community is not flexible enough.How to aggregate and integrately represent these knowledge resources to make the relations complete and clear,strengthen the machine actionability.Secondly,how to use the relations between knowledge to achieve efficient management and efficient retrieval of knowledge resources,and provide users with better services.Since the concept of knowledge graph was proposed,many fields have used knowledge graph for knowledge management,overcoming many deficiencies of traditional knowledge organization.This paper uses a knowledge graph to aggregate and integrately represent cross-community knowledge resources and proposes solutions for corresponding challenges.The main work and contributions are summarized as follows:First,for the integrated representation of cross-community-related knowledge,we construct a knowledge graph for artificial intelligence open source knowledge resources,AIOSSKG.We combine various data sources and develop specific extractors for semistructured and unstructured data to realize automatic entity and relation extraction.The open data extractor is based on expert-defined rules,and the text data extractor is based on remote supervision named entity recognition and unsupervised relation extraction.The extracted knowledge is aggregated in the form of triple knowledge units and then incrementally input into the knowledge graph.Currently,AIOSSKG contains about 240,000 entities and 750,000 relations.The comprehensive evaluation shows that the constructed knowledge graph is of high quality.Second,in view of the lack of classification organization of AI domain open source repositories,we propose an automatic open source repositories classification method based on knowledge graph hybrid embedding.This method combines knowledge graph representation learning and content text representation learning,uses a deep neural network to implement multi-label classification,and achieves feature fusion and model fusion of knowledge graph and text modalities.The experimental results show that this method outperforms traditional content-based text classification methods.Compared with the baseline method,the optimal model’s weighted average precision and recall rate improved by38.3% and 32.6%,respectively.Third,according to the actual needs of users to retrieve AI domain open source repositories,we propose a retrieval system based on the knowledge graph.The system supports the retrieval of related repositories and dependent packages by inputting related entities’ keywords.The core retrieval method MPCF is based on the collaborative filtering algorithm implemented on the knowledge graph and double-weights the relation and semantic layers.The experiment of simulating users for keyword retrieval shows that the retrieval relevance and retrieval efficiency of this method is better than that of the content text-based semantic search engine.In addition,the system also implements interactive retrieval,which can return various types of related entities.Simulation experiments show that the performance of interactive retrieval gradually improves with the number of interactions,which can alleviate the problem of unclear initial input requirements of users. |