文译名：profuzzer：实时输入类型探测，以更好地发现零日漏洞作者：游伟单位：普顿大学国家： #美国年份： #2019年来源： #SP 关键字： #fuzzing 代码地址：笔记建立时间： 2023-04-26 09:31

abstract

实时的探测技术。可以自动的发现和理解关键 fields，并且自动调节变异策略
用单个字节变异，然后自动分析其 fuzz 结果，将相关字节连接在一起，并识别连接后的字段类型，依据该类型的变异策略来进一步变异。

introduction

changes

most of input based grammar mutation lead to the same execution path and therefore only one of the should be tested
futhermore, maybe inputs that exploit vulnerabilities not follow the grammar
random mutation has large input space

fuzzing the type probing

propose a novel technology called profuzzer which has two stages:

first stage: it conducts sequential probing——enumerating every values of every bytes, each time changes on byte. And it also collection information about the target’s execution paths under different byte values. These information will used to be analyzed to recover data fields and field type.
second stage: utilize these imformation to mutate each field to exploit the values that could lead to an attack or explore legitimate values for better coverage.

Overview

probing engine
- generate type templates
  - Iterates all possible value of byte of input,
  - Collects the corresponding execution profiles.
  - Extracts metrics as the features of bytes
  - Groups the consecutive bytes with similar features into a field
  - Determines the type of each field based on the relations between the changes of its possible values and the corresponding variation of observed features
Mutation engine
- guid mutations based on the probed templates
- a. increase code coverage
- b. the filed informatin can be used to guide exploit generation
Execution Engine and Report Engine
- based on the AFL

DESIGN AND IMPLEMENTATION

Assertion

only have one correct value

Raw Data

don’t affect the execution of a program

Enumeration

has a set of valid values

Loop Count

determines the times of snippet is executed
it has substantial impact on path counts and negligible impact on the paths themselves

Offset

determines the data location
it may impact on the execution paths

Size

determines the amount of data the program should read from the input file
it may impact on the execution paths

if a byte of input does not fall into the above categories, then used the random mutation strategy

Type Probing

probing method has been mentioned many time above and will not repeat again
it’s worth noting that the way of storing profile (probing result) is edge vector, a campact hash arry
- edge vector keeps the execution frequencies of individual control-flow edges

defintions

author defined two metrics based on edge vectors before execute and after execute

coverage similarity: the size of the edge coverage intersection of the two vectors, dived by the size of their union
frequency difference: the radio between the number of edges with different frequencies and the number of edges that different coverage
注意公式中出现的两种 edge vector 是 i 位置修改前后的两个 edge vector

step 1 feature extraction

STEP II: Field Identification

author considers that invalid values always lead to the same termination path with an exception, which is the shortest path, as a result these invalid values must lead to the same coverage and frequency metrics (个人认为除了值相同，值的下标也应该相同吧)

STEP III: Field Type Identification

Assertion Fields

there exists one and only one value v for the byte that induces a similarity score 1
the similarity score of any other value, is less than the midrange value

Raw Data Fields

raw data fields values do not affect the control flow of program

Enumeration Fields Loop Count Fields Offset Fields

Reprobing

probing is not the fuzz, it not mutate the input. It just change the byte of input sequentially and change one byte each time. After probing, we get the field ande field type of the input to the testing program. Then we need to mutate it, but based the seed we may get crash and may get better coverage, if the input get better coverage than before, we set this input as new seed and reprobing it

Type Guided Mutation

two step:

exploration mutation:
- limit the mutation to the legitimate values of the field type to achieve better coverage.
- do not allow any mutation on raw data fields
exploitation mutation
- exploit a set of special values (for the specific field type) that could lead to potential attacks.
在引言部分，作者提出一个很好的观点，就是一个全面的、特定于应用程序的、语义丰富的输入规范对于模糊测试来说是不必要的。更有用的信息是识别对模糊测试至关重要的特殊类型的字段。

目的：方法：意义：效果：

abstract#

introduction#

changes#

fuzzing the type probing#

Overview#

DESIGN AND IMPLEMENTATION#

fuzzing related input field types#

Type Probing#

defintions#

step 1 feature extraction#

STEP II: Field Identification#

STEP III: Field Type Identification#

Reprobing#

Type Guided Mutation#