| The number of open source projects over the Internet is increasing extremely rapidly, thus benefiting a lot of fields in software engineering. However, the most popular code repositories store code in text form while ignore its inner structures. And the way to retrieve code is limited to keyword searching. This leads to poor understanding of code.Code query is an efficient way to improve software comprehension. Usually, code query technology contains three steps including information extraction, query, and pre-sentation. In this paper, we presents a novel approach to store code information. This approach uses graph as its basic storage structure. And code information is stored into a graph database called Neo4j. The programming language we target at is Java. This paper mainly focuses on the extraction step of code query and it also provides some contents about query and presentation. The approach we provided ensures fine enough storage granularity. And with the query language Cypher provided by Neo4j, users can design and implement query algorithms upon stored information conveniently.The contributions of this paper are as follows:We designed a basic storage schema for Java. This schema uses AST as its base. In order to make each entity distinguishable, we import a unique key node for each type, method, and variable. And we merged redundant nodes so that the scale of the stored graph can be reduced. We also provided an incremental way to extend the basic storage schema. And we implemented some commonly-used extensions including call, generalization, implementation and association.We provided a series of prototype tools to support storage and query of Java code. We implemented a plugin for code storage. It supports automatically transfor-mation from Java source code to the basic storage schema. It also supports some commonly-used extensions to the schema. We also implemented plugins for querying and visualization. The function of query plugin is to provide editing environment for Cypher. And the visualization plugin is used for showing query results graphically.We experimented with nine large open source Java projects. From the result, we analyzed Neo4j’s ability of storing Java code. We also analyzed the space cost when a Java project is stored into Neo4j. We also provided some practical use cases of query to indicate various usage of our approach, including computation of software metrics, locating target types, and finding call relationships in cycle in projects. |