中文译名:通过操纵距离引导的模糊测试自动生成堆溢出的可利用堆布局 作者:Bin Zhang 单位:国防科大 国家: #中国 年份: #2023年 来源: #USENIX会议 关键字: #fuzzing #AGE 代码地址:Epeius/Scatter: Automatically Generate Exploitable Memory Layout for HeapOOB-Write Vulnerabilities in Non-Interpreter Software (github.com) (代码没有上传,只有readme) 笔记建立时间: 2023-05-31 09:35
Abstract
- heap primitives are leveraged to construnct exploitable states
- prior efforts only focus on particular program types or programs with dispatcher-loop structures, these are hard for general-purpose programs
- this paper present scatter, enabling the generation of exploitable heap layouts for heap overflows in generalpurpose programs in a primitive-free manner.
Background
- A heap primitive is a code snippet that invokes one or more heap operations (such as malloc and free), and it is often invoked multiple times with a certain program input.
- using heap overflows vulnerabilities to achieve control flow hijacking need strict control of heap allocations, which is challenging
- Existing approaches to constructing exploitable heap layouts can be categorized into two types: modeling-based approaches and fuzzing-based approaches.
- existing approaches rely on heap primitives that are easy to be utilized in specific programs
关于 A Running Example 的解释
图中的f和m是指free和malloc,代码(listing 1)中有注释。
listing1的漏洞在于可以输入超过申请chunk大小的输入而造成堆溢出。
可利用的堆溢出是指可以从当前chunk溢出到下一个chunk的头,从而修改chunk的属性。
所以简单的poc(论文中给出的0x100个字节的poc)只能造成程序崩溃或者甚至都不会造成程序崩溃。
想要堆溢出就需要构造堆的布局,就是图中右边,也就是输入epoc。epoc精心构造的目的就是让最后代码的45行输入的puredata可以放入堆块A,这样溢出后就可以影响到堆块B的头,甚至C的头也可以。
如果不精细构造的话输入的puredata纵然产生了堆溢出,但是没有影响到其他堆块。
Methodology
input is the source code of target program and the PoC that triggers a heap overflow vulnerability
output are ePoCs that construct exploitable heap layouts
Identifying Victim Objects
- indentify the sensitive structures by static analysis
- sensitive structures:
- a structure that contains pointers
- a structure that has no pointers but contains a member that can affect a buffer’s access
- a union structure that contains previous two types of structures and is accessed as its structure type
- 这步骤是用来识别代码中对sensitive structures的定义(猜测)
- sensitive structures:
- identify potential victim objects at runtime by dynamic instrumentation (为啥需要动态分析)
- after recognizing, scatter locates all victim object allocation points by indentifying heap operations that allocate or free memory for the sensitive structures
- 这部分是识别代码中对sensitive structures的free和maclloc操作
- analyze the type for the return of allocation functions
- 通过分析LLVM IR
Pinpointing Critical Input Bytes
这部分用于识别输入中可能影响堆操作的字节
Identifying Mutable Operations
- 这个可变操作应该指的是受输入影响的对操作,这影响可以是执行次数,也可以是执行参数
- build LDG(Layout Dependence Graph):SCATTER firstly utilizes a heap-operation-guided fuzzer to explore different execution paths, and then extracts heap operations and control flow information on each path.(这里咋实现的)
- SCATTER identifies mutable heap operations by identifying whether the parameters or execution times of the operations change when giving different program inputs.
- leverages dynamic taint analysis technique to check whether an operation’s parameters can be affected by the input bytes.
Mapping to Input Bytes
- leverages a lightweight “mutate-check” strategy to locate input bytes that can affect mutable heap operations.
- 先以输入a执行一次程序,收集参数和可变堆操作出现的次数。然后对a进行多次变异,执行程序,检查参数和可变堆操作的次数是否变化。
Modeling Fuzzing-Based Manipulation
a. collect a set of victim objects, determine their locations in the heap
- through dynamic instrumentation, obtain all victim objects that are alive in heap
b. adjust vulnerable object’s location according to the victim objects’ locations through fuzzing, based on the manipulation distance metric
- for each victim object, we traverse the trace of heap operations to locate all suitable free chunks for placing the vulnerable object and calculate the manipulation distances.
- the suitable free chunks following requirements: (1) it is freed before allocating ov and has the same size with ov; (2) ov can overflow into os when ov is placed into this free chunk.
Defining Basic Manipulation Distance
- distance is based on the heap operation
a. the position of free chunk c in the free list
- nA and nF denote the number of allocation and the number of free operations with the same size of c in ⃗ Ro, respectively. (heap operations ⃗ Ro between c’s free and ov’s allocation)
- ζ{0,1} = 0 if c’s free list is FIFO, otherwise, ζ{0,1} = 1 for FILO.
- δ represents the position index (start from 1) of c in the free list according to the allocation order.
b. basic manipulation distance (to corrupt a victim object os by placing the vulnerable object ov into a free chunk c)
(这把上面的式子代入不是0吗?啥意思)
Handling Early Occupation Problem
occupation problem means our target chunk c will be not free when we allocate the vulnerable object
the smaller overload factor is, the less likely occupation problem is to occur
combine the overload factor and Basic Manipulation Distance, the extended manipulation distance d is :
Handling Split-Merge Mechanism
this part don’t change the equation of distance, author build a model for the $n_A$ and $n_F$ based on the different situation of chunk split and merge.
Manipulation Distance-Guided Fuzzing
- extract the heap operation trace, the vulnerable object, and all alive victim objects when the overflow occurs
- For each SCATTER calculates the distance to corrupt it.
- By iteratively reducing the distance to 0, SCATTER generates the final ePoCs.
two challenge
-
how to determine a mutated PoC triggers the same vulnerability as the initial PoC does
- For two PoCs, if their vulnerable objects’ allocation points and overwriting points from the ASAN’s reports are same, we regard them as triggering the same heap overflow.
-
which PoCs deserve higher priorities to fuzz and how much mutation energy should be assigned?
- SCATTER is based on genetic algorithms and selects the seeds according to the following criteria:
- Shorter distance
- More victim objects
- More free chunks
- Diverse heap operation sequences
- The priorities for the criteria are (from high to low): shorter distance, more victim objects, more free chunks and diverse heap operation sequences.
- base on these criteria, author set a expansion factor to adjust the mutation energy obtained by coverage-based fuzzing
- SCATTER is based on genetic algorithms and selects the seeds according to the following criteria:
Evaluation
ePOC Generation
- 总体上看起来这个sensitive struct的识别率不是很高
- 疑问:sensitive struct总数是哪里来的,和identify的有啥区别
the author believe that the reasons of failed cases are
- Limited number of victim objects
- Limited heap operations
- Limited explored paths
- the heap operations on the explored pathes are unable to adjust the location of the vulnerable object
- complex path constraints.
- Running Failure
Comparison with State-of-the-Art
- AFLcrash意识是AFL的崩溃测试模式
- AFLcrit是给AFL喂SCATTER识别的关键字节
- SCAg是用Gollum’s distance
- SCA*是用AFL的能量分配策略
Time Consumption
总之就是它快
Limitation
- Other General Heap Managers
- Customized Heap Managers
- Multi-threads.
Summary by myself
explored path是啥 结合Case Study回顾一下
目的: 方法: 意义: 效果: