通过深度代码表示，在大型代码库中预测漏洞

论文标题

通过深度代码表示，在大型代码库中预测漏洞

Predicting Vulnerability In Large Codebases With Deep Code Representation

论文作者

Tanwar, Anshul, Sundaresan, Krishna, Ashwath, Parmesh, Ganesan, Prasanna, Chandrasekaran, Sathish Kumar, Ravi, Sriram

论文摘要

当前，尽管软件工程师经常为各种模块编写代码，但会引入各种类型的错误 - 编码，逻辑，语义和其他错误（其中大多数并未被编译和其他工具捕获）。这些错误中的一些可能在测试的后期发现，并且很多次由客户在生产代码上进行报告。公司必须花费许多资源，包括金钱和时间来查找和修复如果编码正确，这些错误将被避免。此外，软件中隐藏的缺陷会导致安全漏洞，从而有可能允许攻击者损害系统和应用程序。有趣的是，过去固定的相同或类似的问题/错误（尽管在不同的模块中）倾向于再次引入生产代码。我们开发了一个基于AI的新型系统，该系统使用了从源代码创建的抽象语法树（AST）的深度表示，还使用了主动反馈循环，以识别和警告开发本身可能引起的潜在错误，即开发人员正在编写新代码（Logic和/或函数）。该工具与IDE集成为插件将在后台起作用，指出现有的类似功能/代码分析和这些功能中的任何相关错误。该工具将使开发人员能够在开发时正确合并建议，而不是等待UT/QA/客户提出缺陷。我们在开源代码和Cisco代码库中评估了我们的工具，用于C和C ++编程语言。我们的结果证实，源代码和主动反馈循环的深度表示是预测代码中存在的安全性和其他漏洞的一种保证方法。

Currently, while software engineers write code for various modules, quite often, various types of errors - coding, logic, semantic, and others (most of which are not caught by compilation and other tools) get introduced. Some of these bugs might be found in the later stage of testing, and many times it is reported by customers on production code. Companies have to spend many resources, both money and time in finding and fixing the bugs which would have been avoided if coding was done right. Also, concealed flaws in software can lead to security vulnerabilities that potentially allow attackers to compromise systems and applications. Interestingly, same or similar issues/bugs, which were fixed in the past (although in different modules), tend to get introduced in production code again. We developed a novel AI-based system which uses the deep representation of Abstract Syntax Tree (AST) created from the source code and also the active feedback loop to identify and alert the potential bugs that could be caused at the time of development itself i.e. as the developer is writing new code (logic and/or function). This tool integrated with IDE as a plugin would work in the background, point out existing similar functions/code-segments and any associated bugs in those functions. The tool would enable the developer to incorporate suggestions right at the time of development, rather than waiting for UT/QA/customer to raise a defect. We assessed our tool on both open-source code and also on Cisco codebase for C and C++ programing language. Our results confirm that deep representation of source code and the active feedback loop is an assuring approach for predicting security and other vulnerabilities present in the code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题