| Program refactoring is a change to the internal structure of software. Its purposeis to make the software easier to understand and cheaper to modify with preserving itsobservable behavior. Program refactoring is widely used to delay the degradation effectson software aging and it is beneficial to improve the quality of the program. Procedureextraction is that the statements in a procedure are extracted into a new procedure, andare replaced with an appropriate call. Procedure extraction is one of a commonly usedrefactoring methods to reduce the clones in the software products. This dissertation stud-ies the procedure extraction method for clone code and the influences of refactoring onthe module dependencies in evolutionary software.After clone detection, there are some troubles that the output results of the clonecode detection tool cannot be directly refactored. To address these problems, an approachfor preprocessing clone is proposed in this paper. First, we propose a novel algorithmwhich combines A-KNN(Adaptive K-Nearest Neighbors,A-KNN)clustering methodand graph to reduce the false positives of copy-paste related bugs detection. Second, anevaluation method based on cost-benefit is developed to identify clone groups for refactor-ing. This preprocessing method is not only improving the accuracy of copy-paste relatedbugs detection but also beneficial to the following study of refactoring.When clone code is extracted out to form procedures, there are some disadvan-tages that some clone code fragments can’t be extracted directly by previous syntax-preserving procedure extraction algorithm. To solve these problems, this paper proposes anew semantics-preserving amorphous procedure extraction algorithm for non-continuousclones. This approach analyzes the program semantic information with program de-pendence graph and abstract syntax tree. Its characteristic is not syntax-preserving butstructural semantics-preserving so it can address continuous clone code which can not beextracted directly by the traditional method and relax the constraints to promote the un-marked statements are not required to handle exiting jumps that cannot be addressed withtraditional procedure extraction method. The experimental results show that the amor-phous procedure extraction method improves the accuracy and adaptability of procedureextraction. It can reduce clones in programs and improve the code quality.The existing procedure extraction techniques can handle automatic extraction of ex-act clones effectively, but fails for near-miss clones. To address this problem, we de-veloped SPAPE, a novel semantic-preserving amorphous procedure extraction method toextract near-miss clones. SPAPE relaxes the constraint of having the same syntax anduses the structural semantic information. First, SPAPE analyzes the structural semantic-s of original programs with program dependence graphs. Then, it performs amorphoustransformation on program dependence graphs to preserve the structural semantics. Thedifferent statements in near-miss cloned code fragments can be identified by analyzingPDGs, and they can be merged by inserting control variables and conditional statementsin the Abstract Syntax Tree. Finally, the near-miss cloned code fragments can be re-placed with a procedure call. We evaluated the performance, effectiveness, and benefitsof SPAPE.Procedure extraction changes the dependencies between the modules in software.This phenomenon is consistent with the Theory of Relative Dependency. In this study, wetest the theory from an evolutionary perspective by examining the consecutive releasesof a large number of large-scale open-source products. We found that the dependenciesbetween the modules in software are inequality: Compared with the larger modules, de-pendencies do concentrate over smaller modules regardless of product age. Furthermore,continuous refactoring efforts exacerbate the concentration of dependencies on smallermodules over product lifetime; Therefore, we suggest that software managers and devel-opers give higher maintenance and QA priority to smaller modules over the lifetime of aproduct, and this preference should be even increased as a product ages.To sum up, the research on the preprocessing approach for clone code and the amor-phous procedure extraction method solves the following two key problems: The first isto preprocess clones for refactoring and the second is to extract clones which cannot beaddressed with the traditional method. And then the evolutionary analysis of module de-pendencies reveals the change rules of the dependencies between the modules and theinfluences of refactoring on this kind of change. |