• information source
  • image.png
  • SEMANTIC INFORMATION RETRIEVING
    • NLP tool process these information, output calling sequence.
    • so, fuzzer utilizes the sequences to guidline the fuzz
      • this part i want know how the fuzzer use these sequences, how two prune the unreachable path
    • I noticed that the example both have cve report and git log
    • Generating parse tree: the tool is pyStatParser,author use it to generate the syntax tree
    • Retrieving affected version: basically, match through regular expressions
    • Retrieving vulnerability type: from the 70 types of Linux kernel related CWEs, author choosed 16 types as the default type of semfuzz and semfuzz identified through the parse tree. If there it no vulnerability in the parse tree, semfuzz check out in the NVD.
      • why choose the 16 kinds of types
      • How can semfuzz check out vulnerability type in NVD, artificial or automatic?
    • Retrieving vulnerable functions:by compare the patched program and the unpatched program
      • firstly, semfuzz search the patched function name in the parse tree
      • if can not find, secondly, search the patched function’s variables in the parse tree
    • Retrieving critical variables: follow two rules: a. variable appears in a unpatched vulnerable function; b. variable also mentioned in the description of information
      • firstly, semfuzz retrieve all variables in the unpatched functions and build a symbol table.
      • then, searche the parse tree
      • note that a variable must be a noun or an adjective in a phrase
    • Retrieving system calls: author build a system calls database for reveal the relationship between variables and systemcalls and search the system call in the database useing the variables
    • semfuzz based on the kernel fuzzer Syzkaller
    • Generating Seed Input: use the retrieved system calls as a imcomplete seed and crrelates other relative systemcalls as the complete seed
    • Coarse-level Mutation: change the system calls sequences, compare the execution trace of fuzzing instance and target, calculate the distance
    • Fine-grained Mutation: change the parameter and compare the basic block number

what can i improve

  • the methd to generate syntax tree or we don’t use syntax tree
    • before generate syntax tree, we can delete the irrelevant infromation firstly
  • semfuzz need both git logs and cve report
    • maye can use less information
  • the method of retrieving vulnerable functions maybe can improve
  • if i extend this approach to other software, maybe don’t need retrieve system calls
  • the method of retrieving information too simple and rough and lack of semantics although it called semfuzz
  • the improve about the fuzz method is almost nothing, just use the result of retrive information as the fuzz seed