对抗性榜样：对Windows恶意软件检测机器学习的实际攻击的调查和实验评估

论文标题

对抗性榜样：对Windows恶意软件检测机器学习的实际攻击的调查和实验评估

Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection

论文作者

Demetrio, Luca, Coull, Scott E., Biggio, Battista, Lagorio, Giovanni, Armando, Alessandro, Roli, Fabio

论文摘要

最近的工作表明，对抗性的Windows恶意软件样本（在本文中称为对抗性榜样）可以通过扰动相对较少的输入字节来绕过基于机器学习的检测。为了保留恶意功能，以前的攻击要么将字节添加到文件的现有非功能区域，因此可能限制其有效性，或者需要运行计算值的验证步骤，以丢弃无法在沙盒环境中正确执行的恶意软件变体。在这项工作中，我们通过开发一个统一的框架来克服这些局限性，该统一框架不仅涵盖了对机器学习模型的先前攻击，而且还包括三个基于实用，功能性的操纵的新颖攻击，以对Windows Portable可执行执行（PE）文件格式。这些攻击称为完整的DOS，扩展和移动，通过分别操纵DOS标头，扩展并移动第一部分的内容来扩展和移动。我们的实验结果表明，这些攻击的表现都超过了白盒和黑色盒子场景中的现有攻击，在逃避率和注射有效载荷的大小方面取得了更好的权衡，同时也使人们能够逃避对先前攻击具有强大攻击的模型。为了促进我们发现的可重复性，我们为SECML-Malware Python库的一部分开放了我们的框架和所有相应的攻击实现。我们通过讨论当前基于机器学习的恶意软件探测器的局限性以及基于嵌入域知识的潜在缓解策略的局限性来结束这项工作，直接来自主题专家的嵌入域知识。

Recent work has shown that adversarial Windows malware samples - referred to as adversarial EXEmples in this paper - can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable (PE) file format. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better trade-off in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题