论文标题

学习数据库系统安全

Learned-Database Systems Security

论文作者

Schuster, Roei, Zhou, Jin Peng, Eisenhofer, Thorsten, Grubbs, Paul, Papernot, Nicolas

论文摘要

博学的数据库系统在内部使用机器学习(ML)来提高性能。我们可以期望此类系统容易受到某些对抗性-ML攻击的影响。通常,所学的组件在相互分配的用户或流程之间共享,就像卡奇等微体系资源一样,可能会引起高度现实的攻击者模型。但是,与对其他基于ML的系统的攻击相比,攻击者无法直接与学习模型相互作用,因此面临着间接水平。另外,同一系统的学习和非学习版本的攻击表面之间的差异通常是微妙的。这些因素掩盖了ML掺入的事实上的风险。我们分析了学识渊博的数据库系统中潜在攻击表面的根本原因,并开发了一个框架,以识别源自ML使用的漏洞。我们将框架应用于目前正在数据库社区中探索的广泛学习组件。为了从经验上验证我们的框架浮出水面的漏洞,我们选择了其中的3个,并实施和评估针对这些漏洞。我们表明,ML的使用导致数据库中过去查询的泄漏,启用中毒攻击,在索引结构中引起指数记忆爆炸并在几秒钟内崩溃,并使索引用户能够通过自己的钥匙上的定时查询来侦听彼此的密钥分布。我们发现,对抗性ML是对数据库系统中学到的组件的普遍威胁,指向我们对学习系统安全的理解,并通过讨论缓解来结论,同时指出数据泄漏是在多个派对之间共享的系统中固有的。

A learned database system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned database systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned components currently being explored in the database community. To empirically validate the vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against these. We show that the use of ML cause leakage of past queries in a database, enable a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enable index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is an universal threat against learned components in database systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源