Font Size: a A A

Symbolic Execution For Parsers

Posted on:2023-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:W Y PanFull Text:PDF
GTID:2558307169483594Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Many software have parsers.The parsers convert the software input into the internal data structure of the software,and then the functional code of the software is executed based on the converted internal data structure.The parsers require the software input to satisfy some specific grammatical requirement.If the input does not satisfy the grammatical requirements,the parsers throw an exception and exit.Traditional symbolic execution tools treat each byte of input as a symbolic variable,generating corresponding test inputs for the program path at the byte level.These inputs are difficult to pass the grammar checks of complex parsers.Therefore,it is a challenge for traditional symbolic execution to generate valid inputs that can satisfy the input grammar requirements of complex parsers so that the functional code of the software can be sufficiently tested.We observe that complex parsers first perform lexical analysis to convert the input string into a token sequence,and then perform grammar checking on the token sequence.Each token sequence corresponds to a pattern in the input grammar.Based on this observation,we proposes a grammar agnostic symbolic execution framework that automatically generates token sequences for testing complex parsers.Compared to traditional symbolic execution tools with byte-level symbolisation,the key idea of our approach is token symbolisation,which can more effectively generate valid inputs that can satisfy the input grammar requirements of complex parsers and improve the efficiency of symbolic execution for testing functional software code.We implements a prototype framework on the Java Pathfinder-based dynamic symbolic execution engine and conducts large-scale experiments on a real Java parser.The experimental results show that this approach improves both statement coverage and branch coverage compared to traditional byte-level symbolic execution and the state-of-the-art fuzzing methods.Six unknown program errors were detected during the experiments,and the method achieved orders of magnitude speedups compared to the two baseline methods when the same program errors were found.
Keywords/Search Tags:Symbolic execution, Grammar, Parsing Program, Tokenization
PDF Full Text Request
Related items