中文译名:利用 LSTM 和专家系统从漏洞报告中自动提取软件名称 作者:Igor Khokhlov 单位:圣心大学 国家: #意大利 年份: #2022年 来源: #IEEE_STC 关键字: #提取信息 代码地址: 笔记建立时间: 2023-05-15 10:20
Abstract
- proposes a machine learning method to extract software product names and version from unstructured CVE descriptions automatically
- create context-aware features by using Word2Vec and Char2Vec
- use this features to train a NER model using LSTM
- based on the previously published CVE descriptions, author create a set of Expert System (ES) rules to refine the predictions of the NER model and improve the preformance of the developed method.
METHODOLOGY
use two major models: NER model and ES model
- NER model is responsible for classify each word within the description as software name (SN), software version (SV), other (O)
- ES model is responsible for verify the result of NER model
NER
ES and Rules
- author examined the dataset and found that
- 61.5% of all SN in the dataset occurs within the first ten words in the sentence and almost 91% of all SN lies within the first 30 words of the sentence.
- almost 73% of all SV lies no further than five words from the related SN, and almost 90% within ten words proximity.
- Software Name Extraction Rules
- The word is classified as NNP and within 40 words range from the sentence beginning and does not belong to an article and is in the CPE dictionary. This rule is based on the training dataset analysis (see Table I). 2) The word is between two SNs.
- The word is between two SNs.
- Software Version Extraction Rules
- It contains digits and is not further than 30 words from the last SN. This rule is based on the training dataset analysis
- It is in the list of trigger words and is not further than 30 words from the last SN
- It contains digits, and the previous word is classified as an SV
- This word is “and” or “or” and is between two SVs.
目的: 方法: 意义: 效果: