| Architecture design and code implementation are two essential and integral parts of the software development life cycle.However,there exists a significant gap between software architecture and source code because they represent the interdependent artifacts at different levels of abstractions.It is beneficial to analyze the relationship between architecture and code and establish traceability between them for maintaining and evolving software systems,especially for refactoring and maintaining architecture.Therefore,analyzing and establishing traceability between architecture and code can facilitate preventing architecture erosion and the occurrence of architecture smells.The key problem of this dissertation is to build the relationship between architecture and code,and(semi-)automatically extract architecture smell and refactoring solution relevant code.Although,some work has been done on exploring the specific relationships between software architecture and code,and proposed methods to refactor architecture smells,there still exist urgent problems to be addressed:(1)It is not clear how developers analyze and use the relationships between architecture and code in practice,and whether the relationships have an impact on software maintenance and evolution,especially architecture maintenance and refactoring(Problem 1).(2)Architecture smells have a system-wide negative impact on the maintainability of the software system,therefore detecting and refactoring architecture smells costs more effort.However,researchers have proposed various definitions and methods for detecting architecture smells with very few effective refactoring methods proposed.Therefore,there is a lack of research on analyzing how developers understand,detect,and refactor architectural smells in practice(Problem 2).(3)Detecting and refactoring architecture smell-related code requires both relevant knowledge and experience,which is more difficult to fix than code smells.The professional software development questions and answers websites(e.g.,Stack Overflow)provide rich discussion about architecture smells.However,there is no literature that investigates how to extract,analyze,and establish the traceability links between architecture smells,smelly code,and refactoring solutions(Problem 3).To address the above three problems,this dissertation focuses on establishing traceability between architecture smells and code from the perspective of developers,and proposing a method to(semi-)automatically extract and analyze architecture smellrelated code and refactoring solutions by using the text classification and traceability approaches.The main contributions of this dissertation are the following:To explore practitioners’ understanding of the relationship between architecture and code and study the impact of traceability on software maintenance and evolution(Problem 1),this dissertation firstly used empirical software engineering methods including an online industrial survey with 87 valid questionnaire responses and interviews with eight practitioners,which were analyzed using qualitative and quantitative data analysis approaches.This dissertation then conducted a systematic mapping study to select,analyze,and synthesize 63 relevant studies published between January 2000 and May 2020.The results of the industrial survey and the mapping study show that:(1)practitioners mainly used five types of relationships between architecture and code,of which traceability is the most important relationship to support software maintenance and evolution;(2)traceability is used to support 11 maintenance and evolution activities,including change management,architectural maintenance,and architecture refactoring;(3)strong empirical evidence from industry is needed to validate the impact of traceability on maintenance and evolution;(4)easing the process of change management is the main benefit of deploying traceability practices;(5)establishing and maintaining traceability links is the main cost of deploying traceability practices;and(6)13 approaches and 32 tools that support traceability in maintenance and evolution were identified.The findings of this dissertation provide a comprehensive understanding of deploying traceability practices in software maintenance and evolution phase,and can provide researchers with future directions and assist practitioners to make informed decisions while using traceability in maintenance and evolution.To further analyze how developers understand,detect,and refactor architectural smells in practice(Problem 2),this dissertation searched the relevant posts in Stack Overflow and collected 207 relevant posts.Then the Grounded Theory method was used to analyze the discussions in the collected SO posts.The findings of this dissertation show that:(1)developers often describe architecture smells with some general terms;(2)architecture smells are mainly caused by violating architecture patterns,design principles,or misusing architecture antipatterns;(3)developers mainly concern about system maintainability and performance affected by architecture smells;(4)five types of difficulties are identified when detecting and refactoring architecture smells;and(5)four types of methods and eight types of tools are identified to help developers detect and refactor architecture smells,but there is still a lack of dedicated tools to detect and refactor architecture smells.To extract,analyze,and establish traceability links between architecture smells,smelly code,and refactoring solutions discussed in professional software development websites(Problem 3),this dissertation employed machine learning and natural language processing techniques to design a method for(semi-)automatically extracting architecture smell and refactoring solution relevant code based on the collected posts of architecture smells in the last chapter.Then,the designed method was used to extract and analyze architecture smell-related code and their refactoring solutions from a professional question and answer website.Finally,this dissertation validated with an industrial survey the usefulness of the extracted architecture smell-related code and their refactoring solutions.The main results include:(1)obtaining a manually labelled dataset consisting of 208 architecture smell-related posts and 187 architecture smellunrelated posts;(2)obtaining the classification model with best performance to automatically identify architecture smell-related posts by using the combination of Word2 vec and SVM;(3)extracting and classifying architecture smell-related code and their refactoring solutions,including four newly identified types of architecture smell which are not mentioned in literature;(4)analyzing the relationship between the extracted 13 types of architecture smell-related code and their refactoring solutions and their sentiment polarity values;(5)validating with an industrial survey that the extracted architecture smell-related code and their refactoring solutions are useful for practitioners. |