| As the scale of software systems continues to expand and versions iterate,handling the vast amount of code generated during the development process has become a core issue for developers in software system development and maintenance.However,due to the abstract,complex,and variable nature of source code,as well as differences in habits and abilities of software developers,reading and understanding others’ code has become increasingly challenging.Nowadays,the software development field is experiencing two critical new situations: one is the rise of open-source code and communities,leading to the formation of code big data;the other is intelligent software development driven by artificial intelligence technologies such as deep learning and knowledge graphs.Therefore,exploring how to utilize artificial intelligence technology to assist developers in understanding and analyzing source code more efficiently during the development and maintenance process has significant research implications.Through an in-depth review of domestic and international literature,it is found that on the one hand,certain achievements have been made in the analysis of source code,such as methods based on information retrieval and probabilistic models.However,these methods often treat source code as plain text,neglecting the structural characteristics and semantic information inherent in the source code itself.On the other hand,considering the structural characteristics of the source code and the overall design of software systems,the software design patterns detection and the automatic construction of code knowledge graphs have gradually become research hotspots.However,there are still many challenges and problems in these areas which need to be addressed.From perspective of these aspects,this thesis proposed a code structure analysis method based on design pattern detection and implements the automatic construction of code knowledge graphs.The main contributions of this thesis are as follows:(1)A design pattern recognition method based on code features was proposed.Generates a syntactic and lexical representation of Java source code by incorporating code features and the call graphs,then employs the Word2 vec algorithm on this representation to establish the word-space model for the Java source code.Finally,a supervised machine learning classifier is employed to identify design patterns.The experiments demonstrate the effectiveness of the method of extracting code features from source code in the task of design pattern recognition.(2)This thesis proposed a design pattern detection method based on code representation and combines extracted code features.By using abstract syntax trees to process statement-level code fragments,tree-structure representations with lexical and statement-level grammatical knowledge are generated.Furthermore,in conjunction with code features and call graphs,graph neural networks(GCNs)are employed to encode statements.Finally,recurrent neural networks(Bi-LSTM)are utilized to encode the sequence dependencies between statements.This approach captures more syntax,semantics,and structural information in the source code,which played a crucial role in optimizing code representation.Experimental results demonstrate that this method significantly improved detection performance and outperforms several classic design pattern detection methods,providing effective support for object-oriented software code analysis.(3)This thesis proposes an automatic code knowledge graph construction method based on design pattern detection results and code structure analysis.By integrating design pattern detection results with UML class diagrams and converting their graphical structures representation into RDF data.Then,the automatic construction of the code knowledge graph is achieved using the Neo4 j graph database.This construction process integrates various code knowledge information and technical tools,providing an efficient,flexible,and scalable method for analyzing the code knowledge of software systems,thus offering more effective support for the development and maintenance of software systems.In summary,this thesis aims to achieve efficient and accurate analysis of Java code structure by combining design pattern recognition,and automatically construct Java code knowledge graphs.Experiments conducted on publicly available datasets after preprocessing have demonstrated the effectiveness and accuracy of the two design pattern detection methods proposed in this thesis.Furthermore,the automatic construction method for code knowledge graphs presented in this thesis is feasible and practical. |