| A software crash,which refers to the abnormal interruption of a program,is one of the most serious manifestations of software failures.Software crashes bring huge economic losses to daily life and work.Generally,software crashes can be roughly divided into two categories according to their causes.The first category is the programrelated crash,which is caused by the faulty code written by devleopers.The second one is the configuration-related crash,which is caused by unsuitable or illegal software configurations by users.In recent years,how to efficiently find and analyze root causes of software crashes has drawn much attention from both industry and academia.Therefore,researchers have proposed many analysis methods about the root cause of crashes,such as program analysis-based and machine learning-based methods.These methods can help people deeply understand the causes,effectively deal with failures,prevent the occurrence,and reduce the losses.This dissertation studies software crashes in three aspects: program-related crashes localization,configuration-related crashes detection,and configuration optimization.In a program-related crash,the faulty code induced by developers is one of the main reasons for the crash.When a program-related crash occurs,developers have to quickly locate the root cause of the crash before they repair it.However,locating the root cause requires a deep comprehension of the source code and a rich debugging experience.This makes crash localization be a time-consuming and tough task.To solve this task,this dissertation proposes an approach to crash localization based on stack trace mining,namely Cra Ter.Cra Ter transforms the traditional localization problem into a binary classification problem.That is,Cra Ter assists the localization via predicting whether the root cause resides in the stack trace.Specifically,Cra Ter first mines and collects 89 features from the crash stack trace as well as the corresponding source code.Then Cra Ter builds a binary classifier to predict the location of the root cause.To obtain crash data,Cra Ter utilizes the program mutation technique on seven medium and large real-world projects,such as Apache Commons-Codec and Apache Commons-IO,to collect 3,500 crashes as the experimental dataset.Results of crossvalidation on this dataset demonstrate that Cra Ter is effective and the average accuracy can reach 92%.Additionally,this dissertation also discusses the class imbalance problem and feature selection effects in the crash localization problem.The comparative experiment shows that SMOTE method can effectively improve the prediction results of Cra Ter.Besides the program-related crash,the unsuitable or illegal software configuration set by users can lead to a configuration-related crash.The size of its configuration space is exponentially growing with the increasing number of configuration options provided by modern software.This makes configuration-related crash detection difficult.To solve this problem,this dissertation proposes a configuration-related crash detection approach based on the genetic algorithm,called GCS.This approach combines the traditional configuration detection methods,such as One enabled,One-disabled,and Twise,and uses the genetic algorithm to generate a suitable sampling strategy sequence.Specifically,GCS samples and tests different software configurations via a set of steps.In each step,GCS chooses a sampling strategy to sample and finally generates a fixed number of steps of the sampling strategy sequence.GCS takes the number of crashes detected by each sequence as its fitness value and optimizes the sampling strategy sequences iteratively.Finally,an optimized sampling strategy sequence is selected.Experimental results on Apache,Busybox,and Linux shows that the GCS outperforms the other traditional detection methods.To avoid the occurrence of the configuration-related crash,it is necessary to optimize the performance of configurations: users prefer high performance via choosing suitable configurations.However,after studying the traditional configuration optimization methods,researchers find two common issues ignored by the previous methods: the imbalanced training issue and the tie issue.The imbalanced training issue is that previous methods fail in considering the trade-off between the prediction accuracy and the training cost.This issue may result in a worse situation where previous methods,such as Rank-based,only focus on one target and ignore another.The tie issue is that many configurations with distinct performance are predicted as the same performance.This issue leads to large numbers of configurations in tied rankings.Then it is difficult for users to select the actual optimal configuration.To solve the imbalanced training issue,this dissertation proposes an approach based on multi-objective optimization,Mo Config.This approach considers the prediction accuracy and training cost as two objectives to be simultaneously optimized and finally finds a suitable training set to build the performance model.To solve the tie issues,this dissertation proposes an approach based on reranking,Re Config,which builds a ranking refinement model to rerank the tied configurations in the prediction result.Experiments on Re Config and Mo Config demonstrate that these two approaches can outperform the previous methods in terms of the above two issues,respectively.To sum up,this dissertation studies software crashes in three aspects,i.e.,programrelated crash localization,configuration-related crash detection,and configuration optimization.The work in the dissertation can improve the research of software quality assurance and maintenance. |