EVIL Exploiting Software via Natural Language

ABSTRACT

EVIL can automatically generate exploits in assembly/python language from descriptions in natural language.
EVIL leverages Neural Machine Translation techniques and a dataset that author developed.

pre-processing
- tokenization
- standardization: prevent non-English tokens from getting transformed during learning process
  - intent parser: input natural language (intent), output a dictionary of standardizable tokens such as specific values, label names, and parameters
  - Standardizer: input is the output of the intent parser and replace the selected token in both intent adn snippet. just like the step 3 and 4 in figure 1
- embedding
NMT models
- Seq2Seq
  - bi-directional LSTM as the encoder
- CodeBERT
Post-Processing
- it is a inverse operation of standardization, it replaces the symbolic value with the real value

虽然能把自然语言转换成代码，但是需要的自然语言的描述及其详细