| As an important component of software,configuration plays a vital role in improving the functionality,customizability and adaptability of software.At present,large-scale fundamental software systems are highly configurable to adapt the complex environment and changing requirements,thus greatly improving their reliability and availability.However,as the scale of software systems continues to grow,and the interactions among software systems become more complex,misconfigurations have become the major cause of software failures.The complexity of configurations and users’ lack of knowledge about configuration constraints are the main reasons for massive misconfigurations.To help users to correctly utilize software configuration,there are prior researches focusing on constraint comprehension of configuration,namely learning-based methods,program analysis-based methods,and text semantic-based methods.However,the prior researches mainly suffer the following problems: First,considering the demand of large number of sample configuration files in learning-based methods to mine constraints,it is difficult to obtain enough real-world samples.Second,the forms of configuration constraints in source code are various and complex,which are hard to be summarized as simple code patterns.Current researches could only infer constraints in limited patterns.Last but not least,when software system runs in production environment,the configuration constraints could be influenced by workloads.Current researches only infer configuration constraints in static mode,and don’t consider the constraint comprehension under runtime environment with various workloads.Faced with the situations mentioned above,this dissertation makes full use of the knowledge contained in source code to comprehend configuration constraints.Based on several in-depth empirical studies on the characteristics of configuration-related source code,this dissertation proposes three method to comprehend configuration constraints,and implements the corresponding tools to automate the procedure.Considering the fact that the forms of configuration constraints in source code and various and complex,we figure out the existence of configuration constraints in source code based on empirical study,and propose a pattern-based method to infer constraints.However,as a consequence that not all the configuration constraints in source code are in fixed patterns,such as context-aware configuration constraints,we further explore the possible sources of constraints comprehension,and leverage software logs to infer constraints.What’s more,when target software system is running at production environment,the status of the program is influenced by various workloads.Under this circumstance,the workload-related misconfigurations might occur even in valid configuration value that is inferred by static methods.In order to explore the influence of workloads on configuration constraints,we further propose a workload-aware method to infer configuration constraints in dynamic environment.In summary,this dissertation makes the following contributions:(1)We carried out an empirical study on five widely-used open-source software systems,and summarized findings from three different perspectives to guide the automated inference of configuration constraints.Based on the findings,we design and implement Conf HE,a heuristic-based tool to infer configuration constraints automatically.Conf HE mainly focuses on three kinds of configuration constraints,namely numeric value range,enumeration value,and semantic types.Conf HE could infer 671 configuration constraints in total from seven open-source software systems,with an increasement of 458 based on prior research.(2)We propose a method that leverage software logs to infer configuration constraints.Based on the empirical study of misconfiguration injection,we prove the feasibility of inferring configuration constraints from software logs.Guided by the study,we design and implement Conf In Log,an automated tool that combines static program analysis and natural language processing to infer configuration constraints from log messages.To evaluate the effectiveness of Conf In Log,we applied our tool on seven popular opensource software systems.Conf In Log successfully inferred 427 constraints in total,in which 59.5%~61.6% could not be inferred by the state-of-the-art research.What’s more,we submitted 67 documentation enhancement patches regarding the constraints inferred by Conf In Log.The constraints in 29 patches have been confirmed by the developers,among which 10 patches have been accepted.(3)We propose a workload-aware method to infer configuration constraints.Based on the empirical study on configuration-related source code in four open-source software systems,we took an in-depth look into the interaction between configuration variables and program variables,as well as their influence on runtime behaviors of software systems.We summarized five types of branch interaction behaviors subsequently.Guided by our empirical study,we design and implement an automated WMWatcher.WMWatcher identifies branch interaction and instrument relevant probes statically,then monitors the status of program variables in runtime.WMWatcher could help users to understand the configuration constraints under recent workloads,and prevent misconfiguration when configuration modification is needed.WMWatcher instruments 214 probes in total with an average 93.5% precision and 84.7% recall in identifying branch interaction types,and only brings 2.33% extra overhead when software system is running.Finally,the case studies in real-world scenarios demonstrate the effectiveness of WMWatcher in understanding workload-aware configuration constraints.By making full use of the characteristics of configuration-related code,this dissertation designs and implements a series automated toolset to help user comprehend configuration constraints,thus effectively preventing misconfigurations and improving reliability in large-scale fundamental software systems. |