| Database Management System(DBMS),as one of the cornerstone software in the field of information technology,plays a crucial role in the industry.However,as a complex and largescale software,DBMS inevitably introduces various vulnerabilities during the development process,which can be exploited by attackers and cause losses to database users.In recent years,the use of fuzzing techniques to ensure the security of DBMS has become a popular trend.The core idea of fuzzing is to generate random SQL query statements and input them into the DBMS under test to detect vulnerabilities and errors.This thesis conducts experimental analysis on current advanced DBMS fuzz testing tools,and summarizes three key issues that limit their vulnerability discovery efficiency:(1)static inference of data dependency leads to poor quality of test case generation;(2)insensitivity to error paths leads to insufficient deep code testing;and(3)insufficient robustness in sample generation when adapting to new DBMS leads to low code coverage.To address these issues,this thesis proposes an improved gray-box fuzz testing framework for DBMS memory errors,which optimizes SQL semantic instantiation,seed selection and seed scheduling strategies from the perspectives of semantic awareness,error path recognition,and multi-strategy fusion.The thesis also improves the parallel fuzz testing method,enhances the effectiveness of SQL sample generation,code coverage,robustness,and scalability of the fuzzer.The proposed approach discovers six more bugs than Squirrel in the testing of My SQL,SQLite,and open Gauss.The main work and contributions of this paper are as follows:(1)A semantic-aware DBMS fuzz testing method is proposed.Current DBMS fuzzing tools based on syntax mutation generate a large number of syntax and semantic errors in SQL samples due to insufficient semantic awareness of the code,making it difficult to discover deep code vulnerabilities.To address this,this thesis proposes a semantic-aware DBMS fuzzing method.Firstly,SQL queries are parsed into Intermediate Representations(IR),and mutations are performed on the IR.To reduce semantic errors in test cases,the mutated IR is semantically repaired based on the SQL context to dynamically update data dependencies.Experimental results show that this method can generate more syntax and semantically valid samples,thereby enhancing the ability to discover deep code vulnerabilities.(2)A seed selection and scheduling strategy that is sensitive to error paths is proposed.Current mutation-based DBMS fuzzing tools are not sensitive to program error paths and tend to select seeds with syntax or semantic errors,wasting a lot of time testing shallow code in the syntax and semantic checking phase,resulting in low efficiency in discovering vulnerabilities.Therefore,this thesis proposes a three-level seed queue model based on program understanding,adding error path feedback on top of code coverage feedback,to guide seed selection.Seeds with semantic validity and new coverage are prioritized and allocated more mutation energy.Experimental analysis demonstrates the effectiveness of this method from the perspectives of edge coverage and performance overhead.(3)A multi-mechanism fusion DBMS collaborative fuzzing method is proposed.When applying current DBMS fuzzing tools to other DBMSs,it is necessary to model the grammatical structure of the new DBMS,which takes a lot of time and effort and requires a deep understanding of the dialect of the specific DBMS.However,due to the complexity of DBMS,this model is still very fragile.This thesis designs a collaborative fuzz testing framework based on the SQL reshuffle mutation strategy,and cooperates with two lightweight methods of customizing the DBMS keyword dictionary and removing internal syntax analysis and checking,which improves the robustness of testing new DBMSs and optimizes the traditional parallel fuzzing method,improves the efficiency of sample synchronization and realizes the deduplication of redundant crash files,which improves the automation of crash verification in the later stage of fuzzing.Finally,experiments show that this method can effectively improve code coverage and bug discovery capabilities. |