Automated Exploitable Heap Layout Generation for Heap Overflows Through Manipulation Distance-Guided Fuzzing

中文译名：通过操纵距离引导的模糊测试自动生成堆溢出的可利用堆布局作者：Bin Zhang 单位：国防科大国家： #中国年份： #2023年来源： #USENIX会议关键字： #fuzzing #AGE 代码地址：Epeius/Scatter: Automatically Generate Exploitable Memory Layout for HeapOOB-Write Vulnerabilities in Non-Interpreter Software (github.com) （代码没有上传，只有readme）笔记建立时间： 2023-05-31 09:35

Abstract

heap primitives are leveraged to construnct exploitable states
prior efforts only focus on particular program types or programs with dispatcher-loop structures, these are hard for general-purpose programs
this paper present scatter, enabling the generation of exploitable heap layouts for heap overflows in generalpurpose programs in a primitive-free manner.

Background

A heap primitive is a code snippet that invokes one or more heap operations (such as malloc and free), and it is often invoked multiple times with a certain program input.
using heap overflows vulnerabilities to achieve control flow hijacking need strict control of heap allocations, which is challenging
Existing approaches to constructing exploitable heap layouts can be categorized into two types: modeling-based approaches and fuzzing-based approaches.
- existing approaches rely on heap primitives that are easy to be utilized in specific programs

关于 A Running Example 的解释

图中的f和m是指free和malloc，代码（listing 1）中有注释。 listing1的漏洞在于可以输入超过申请chunk大小的输入而造成堆溢出。可利用的堆溢出是指可以从当前chunk溢出到下一个chunk的头，从而修改chunk的属性。所以简单的poc（论文中给出的0x100个字节的poc）只能造成程序崩溃或者甚至都不会造成程序崩溃。想要堆溢出就需要构造堆的布局，就是图中右边，也就是输入epoc。epoc精心构造的目的就是让最后代码的45行输入的puredata可以放入堆块A，这样溢出后就可以影响到堆块B的头，甚至C的头也可以。如果不精细构造的话输入的puredata纵然产生了堆溢出，但是没有影响到其他堆块。

Methodology

input is the source code of target program and the PoC that triggers a heap overflow vulnerability output are ePoCs that construct exploitable heap layouts

Identifying Victim Objects

indentify the sensitive structures by static analysis
- sensitive structures:
  - a structure that contains pointers
  - a structure that has no pointers but contains a member that can affect a buffer’s access
  - a union structure that contains previous two types of structures and is accessed as its structure type
- 这步骤是用来识别代码中对sensitive structures的定义（猜测）
identify potential victim objects at runtime by dynamic instrumentation (为啥需要动态分析)
after recognizing, scatter locates all victim object allocation points by indentifying heap operations that allocate or free memory for the sensitive structures
- 这部分是识别代码中对sensitive structures的free和maclloc操作
analyze the type for the return of allocation functions
- 通过分析LLVM IR

Pinpointing Critical Input Bytes

这部分用于识别输入中可能影响堆操作的字节

Identifying Mutable Operations

这个可变操作应该指的是受输入影响的对操作，这影响可以是执行次数，也可以是执行参数
build LDG（Layout Dependence Graph）：SCATTER firstly utilizes a heap-operation-guided fuzzer to explore different execution paths, and then extracts heap operations and control flow information on each path.（这里咋实现的）
SCATTER identifies mutable heap operations by identifying whether the parameters or execution times of the operations change when giving different program inputs.
- leverages dynamic taint analysis technique to check whether an operation’s parameters can be affected by the input bytes.

Mapping to Input Bytes

leverages a lightweight “mutate-check” strategy to locate input bytes that can affect mutable heap operations.
- 先以输入a执行一次程序，收集参数和可变堆操作出现的次数。然后对a进行多次变异，执行程序，检查参数和可变堆操作的次数是否变化。

Modeling Fuzzing-Based Manipulation

a. collect a set of victim objects, determine their locations in the heap

through dynamic instrumentation, obtain all victim objects that are alive in heap

b. adjust vulnerable object’s location according to the victim objects’ locations through fuzzing, based on the manipulation distance metric

for each victim object, we traverse the trace of heap operations to locate all suitable free chunks for placing the vulnerable object and calculate the manipulation distances.
the suitable free chunks following requirements: (1) it is freed before allocating ov and has the same size with ov; (2) ov can overflow into os when ov is placed into this free chunk.

Defining Basic Manipulation Distance

distance is based on the heap operation

a. the position of free chunk c in the free list

nA and nF denote the number of allocation and the number of free operations with the same size of c in ⃗ Ro, respectively. (heap operations ⃗ Ro between c’s free and ov’s allocation)
ζ{0,1} = 0 if c’s free list is FIFO, otherwise, ζ{0,1} = 1 for FILO.
δ represents the position index (start from 1) of c in the free list according to the allocation order.

b. basic manipulation distance (to corrupt a victim object os by placing the vulnerable object ov into a free chunk c) (这把上面的式子代入不是0吗？啥意思)

Handling Early Occupation Problem

occupation problem means our target chunk c will be not free when we allocate the vulnerable object the smaller overload factor is, the less likely occupation problem is to occur combine the overload factor and Basic Manipulation Distance, the extended manipulation distance d is :

Handling Split-Merge Mechanism

this part don’t change the equation of distance, author build a model for the $n_A$ and $n_F$ based on the different situation of chunk split and merge.

Manipulation Distance-Guided Fuzzing

extract the heap operation trace, the vulnerable object, and all alive victim objects when the overflow occurs
For each SCATTER calculates the distance to corrupt it.
By iteratively reducing the distance to 0, SCATTER generates the final ePoCs.

two challenge

how to determine a mutated PoC triggers the same vulnerability as the initial PoC does
- For two PoCs, if their vulnerable objects’ allocation points and overwriting points from the ASAN’s reports are same, we regard them as triggering the same heap overflow.
which PoCs deserve higher priorities to fuzz and how much mutation energy should be assigned?
- SCATTER is based on genetic algorithms and selects the seeds according to the following criteria:
  - Shorter distance
  - More victim objects
  - More free chunks
  - Diverse heap operation sequences
  - The priorities for the criteria are (from high to low): shorter distance, more victim objects, more free chunks and diverse heap operation sequences.
- base on these criteria, author set a expansion factor to adjust the mutation energy obtained by coverage-based fuzzing

Evaluation

ePOC Generation

总体上看起来这个sensitive struct的识别率不是很高
疑问：sensitive struct总数是哪里来的，和identify的有啥区别

the author believe that the reasons of failed cases are

Limited number of victim objects
Limited heap operations
Limited explored paths
- the heap operations on the explored pathes are unable to adjust the location of the vulnerable object
- complex path constraints.
Running Failure

Comparison with State-of-the-Art

AFLcrash意识是AFL的崩溃测试模式
AFLcrit是给AFL喂SCATTER识别的关键字节
SCAg是用Gollum’s distance
SCA*是用AFL的能量分配策略

Time Consumption

总之就是它快

Limitation

Other General Heap Managers
Customized Heap Managers
Multi-threads.

Summary by myself

explored path是啥结合Case Study回顾一下

目的：方法：意义：效果：

Abstract#

Background#

关于 A Running Example 的解释#

Methodology#

Identifying Victim Objects#

Pinpointing Critical Input Bytes#

Identifying Mutable Operations#

Mapping to Input Bytes#

Modeling Fuzzing-Based Manipulation#

Defining Basic Manipulation Distance#

Handling Early Occupation Problem#

Handling Split-Merge Mechanism#

Manipulation Distance-Guided Fuzzing#

two challenge#

Evaluation#

ePOC Generation#

Comparison with State-of-the-Art#

Time Consumption#

Limitation#

Summary by myself#