文译名:profuzzer:实时输入类型探测,以更好地发现零日漏洞 作者:游伟 单位:普顿大学 国家: #美国 年份: #2019年 来源: #SP 关键字: #fuzzing 代码地址: 笔记建立时间: 2023-04-26 09:31

abstract

  • 实时的探测技术。可以自动的发现和理解关键 fields,并且自动调节变异策略
  • 用单个字节变异,然后自动分析其 fuzz 结果,将相关字节连接在一起,并识别连接后的字段类型,依据该类型的变异策略来进一步变异。

introduction

changes

  • most of input based grammar mutation lead to the same execution path and therefore only one of the should be tested
  • futhermore, maybe inputs that exploit vulnerabilities not follow the grammar
  • random mutation has large input space

fuzzing the type probing

propose a novel technology called profuzzer which has two stages:

  • first stage: it conducts sequential probing——enumerating every values of every bytes, each time changes on byte. And it also collection information about the target’s execution paths under different byte values. These information will used to be analyzed to recover data fields and field type.
  • second stage: utilize these imformation to mutate each field to exploit the values that could lead to an attack or explore legitimate values for better coverage.

Overview

image.png

  • probing engine
    • generate type templates
      • Iterates all possible value of byte of input,
      • Collects the corresponding execution profiles.
      • Extracts metrics as the features of bytes
      • Groups the consecutive bytes with similar features into a field
      • Determines the type of each field based on the relations between the changes of its possible values and the corresponding variation of observed features
  • Mutation engine
    • guid mutations based on the probed templates
    • a. increase code coverage
    • b. the filed informatin can be used to guide exploit generation
  • Execution Engine and Report Engine
    • based on the AFL

DESIGN AND IMPLEMENTATION

Assertion

  • only have one correct value

Raw Data

  • don’t affect the execution of a program

Enumeration

  • has a set of valid values

Loop Count

  • determines the times of snippet is executed
  • it has substantial impact on path counts and negligible impact on the paths themselves

Offset

  • determines the data location
  • it may impact on the execution paths

Size

  • determines the amount of data the program should read from the input file
  • it may impact on the execution paths

if a byte of input does not fall into the above categories, then used the random mutation strategy

Type Probing

  • probing method has been mentioned many time above and will not repeat again
  • it’s worth noting that the way of storing profile (probing result) is edge vector, a campact hash arry
    • edge vector keeps the execution frequencies of individual control-flow edges

defintions

author defined two metrics based on edge vectors before execute and after execute

  • coverage similarity: the size of the edge coverage intersection of the two vectors, dived by the size of their union
  • frequency difference: the radio between the number of edges with different frequencies and the number of edges that different coverage image.png
  • 注意公式中出现的两种 edge vector 是 i 位置修改前后的两个 edge vector

step 1 feature extraction

image.png image.png

STEP II: Field Identification

author considers that invalid values always lead to the same termination path with an exception, which is the shortest path, as a result these invalid values must lead to the same coverage and frequency metrics image.png (个人认为除了值相同,值的下标也应该相同吧)

STEP III: Field Type Identification

Assertion Fields image.png

  • there exists one and only one value v for the byte that induces a similarity score 1
  • the similarity score of any other value, is less than the midrange value

Raw Data Fields image.png

  • raw data fields values do not affect the control flow of program

Enumeration Fields image.png Loop Count Fields image.png Offset Fields image.png

Reprobing

probing is not the fuzz, it not mutate the input. It just change the byte of input sequentially and change one byte each time. After probing, we get the field ande field type of the input to the testing program. Then we need to mutate it, but based the seed we may get crash and may get better coverage, if the input get better coverage than before, we set this input as new seed and reprobing it

Type Guided Mutation

two step:

  • exploration mutation:

    • limit the mutation to the legitimate values of the field type to achieve better coverage.
    • do not allow any mutation on raw data fields
  • exploitation mutation

    • exploit a set of special values (for the specific field type) that could lead to potential attacks.
  • 在引言部分,作者提出一个很好的观点,就是一个全面的、特定于应用程序的、语义丰富的输入规范对于模糊测试来说是不必要的。更有用的信息是识别对模糊测试至关重要的特殊类型的字段。

目的: 方法: 意义: 效果: