文译名:profuzzer:实时输入类型探测,以更好地发现零日漏洞 作者:游伟 单位:普顿大学 国家: #美国 年份: #2019年 来源: #SP 关键字: #fuzzing 代码地址: 笔记建立时间: 2023-04-26 09:31
abstract
- 实时的探测技术。可以自动的发现和理解关键 fields,并且自动调节变异策略
- 用单个字节变异,然后自动分析其 fuzz 结果,将相关字节连接在一起,并识别连接后的字段类型,依据该类型的变异策略来进一步变异。
introduction
changes
- most of input based grammar mutation lead to the same execution path and therefore only one of the should be tested
- futhermore, maybe inputs that exploit vulnerabilities not follow the grammar
- random mutation has large input space
fuzzing the type probing
propose a novel technology called profuzzer which has two stages:
- first stage: it conducts sequential probing——enumerating every values of every bytes, each time changes on byte. And it also collection information about the target’s execution paths under different byte values. These information will used to be analyzed to recover data fields and field type.
- second stage: utilize these imformation to mutate each field to exploit the values that could lead to an attack or explore legitimate values for better coverage.
Overview
- probing engine
- generate type templates
- Iterates all possible value of byte of input,
- Collects the corresponding execution profiles.
- Extracts metrics as the features of bytes
- Groups the consecutive bytes with similar features into a field
- Determines the type of each field based on the relations between the changes of its possible values and the corresponding variation of observed features
- generate type templates
- Mutation engine
- guid mutations based on the probed templates
- a. increase code coverage
- b. the filed informatin can be used to guide exploit generation
- Execution Engine and Report Engine
- based on the AFL
DESIGN AND IMPLEMENTATION
fuzzing related input field types
Assertion
- only have one correct value
Raw Data
- don’t affect the execution of a program
Enumeration
- has a set of valid values
Loop Count
- determines the times of snippet is executed
- it has substantial impact on path counts and negligible impact on the paths themselves
Offset
- determines the data location
- it may impact on the execution paths
Size
- determines the amount of data the program should read from the input file
- it may impact on the execution paths
if a byte of input does not fall into the above categories, then used the random mutation strategy
Type Probing
- probing method has been mentioned many time above and will not repeat again
- it’s worth noting that the way of storing profile (probing result) is edge vector, a campact hash arry
- edge vector keeps the execution frequencies of individual control-flow edges
defintions
author defined two metrics based on edge vectors before execute and after execute
- coverage similarity: the size of the edge coverage intersection of the two vectors, dived by the size of their union
- frequency difference: the radio between the number of edges with different frequencies and the number of edges that different coverage
- 注意公式中出现的两种 edge vector 是 i 位置修改前后的两个 edge vector
step 1 feature extraction
STEP II: Field Identification
author considers that invalid values always lead to the same termination path with an exception, which is the shortest path, as a result these invalid values must lead to the same coverage and frequency metrics
(个人认为除了值相同,值的下标也应该相同吧)
STEP III: Field Type Identification
Assertion Fields
- there exists one and only one value v for the byte that induces a similarity score 1
- the similarity score of any other value, is less than the midrange value
Raw Data Fields
- raw data fields values do not affect the control flow of program
Enumeration Fields
Loop Count Fields
Offset Fields
Reprobing
probing is not the fuzz, it not mutate the input. It just change the byte of input sequentially and change one byte each time. After probing, we get the field ande field type of the input to the testing program. Then we need to mutate it, but based the seed we may get crash and may get better coverage, if the input get better coverage than before, we set this input as new seed and reprobing it
Type Guided Mutation
two step:
-
exploration mutation:
- limit the mutation to the legitimate values of the field type to achieve better coverage.
- do not allow any mutation on raw data fields
-
exploitation mutation
- exploit a set of special values (for the specific field type) that could lead to potential attacks.
-
在引言部分,作者提出一个很好的观点,就是一个全面的、特定于应用程序的、语义丰富的输入规范对于模糊测试来说是不必要的。更有用的信息是识别对模糊测试至关重要的特殊类型的字段。
目的: 方法: 意义: 效果: