| As one of the most effective and practical techniques for vulnerability discovery,greybox fuzzing feeds malformed input data to programs for discovering potential bugs.Considering the syntax structure in program inputs,the fuzzer expects to produce semivalid test cases for testing deep code while triggering exceptions.However,greybox fuzzing depends on blind mutation strategies to produce new test cases since it lacks the information of program input formats.This challenge is concluded as “where and how to mutate” and seriously limits the effectiveness of greybox fuzzing.A novel solution is using byte-level input mutations for probing crucial input structures that affect program behaviors,and thus applies targeted mutation strategies.Compared with traditional taint analysis,the solution has proven to be more efficient and practial.In this thesis,we conduct further research on the technique of input structure probing for greybox fuzzing.We first observed existing input structure probing introduces exces-sive memory overhead.Moreover,the memory overhead also affects the hit ratio of CPU cache thus limits the efficiency and scalaility.To resolve these challenges,we proposed a runtime hash record scheme that reduces the space complexity of existing schemes to O(1),named Hash MTI.Experiments show that Hash MTI achieves a speedup of 8X to22 X and saves 2 to 594 times memory of existing schemes.Further,we reflected the limitation of existing techniques(including Hash MTI)and proposed a greybox fuzzing solution named AFLProber.It extends existing inefficient byte-level probing to on-the-fly random probing with the shadow-memory-based mutation tracking.In addition,we leverage the classic model “explore v.s.exploit” in reinforcement learning to handle the trade-off between probing and protection of crucial input structures.Experiments show that AFLProber effectively improves the performance of existing greybox fuzzers in terms of code coverage and vulnerability discovering. |