NLP tool process these information, output calling sequence.
so, fuzzer utilizes the sequences to guidline the fuzz
this part i want know how the fuzzer use these sequences, how two prune the unreachable path
I noticed that the example both have cve report and git log
Generating parse tree: the tool is pyStatParser,author use it to generate the syntax tree
Retrieving affected version: basically, match through regular expressions
Retrieving vulnerability type: from the 70 types of Linux kernel related CWEs, author choosed 16 types as the default type of semfuzz and semfuzz identified through the parse tree. If there it no vulnerability in the parse tree, semfuzz check out in the NVD.
why choose the 16 kinds of types
How can semfuzz check out vulnerability type in NVD, artificial or automatic?
Retrieving vulnerable functions:by compare the patched program and the unpatched program
firstly, semfuzz search the patched function name in the parse tree
if can not find, secondly, search the patched function’s variables in the parse tree
Retrieving critical variables: follow two rules: a. variable appears in a unpatched vulnerable function; b. variable also mentioned in the description of information
firstly, semfuzz retrieve all variables in the unpatched functions and build a symbol table.
then, searche the parse tree
note that a variable must be a noun or an adjective in a phrase
Retrieving system calls: author build a system calls database for reveal the relationship between variables and systemcalls and search the system call in the database useing the variables
semfuzz based on the kernel fuzzer Syzkaller
Generating Seed Input: use the retrieved system calls as a imcomplete seed and crrelates other relative systemcalls as the complete seed
Coarse-level Mutation: change the system calls sequences, compare the execution trace of fuzzing instance and target, calculate the distance
Fine-grained Mutation: change the parameter and compare the basic block number