Hyperloglog（HLL）安全性：夸大基数估算值

论文标题

Hyperloglog（HLL）安全性：夸大基数估算值

HyperLogLog (HLL) Security: Inflating Cardinality Estimates

论文作者

Reviriego, Pedro, Adell, Pablo, Ting, Daniel

论文摘要

在许多应用程序中需要计算集合上不同元素的数量，例如，以跟踪Internet服务中唯一用户的数量或网络上不同流的数量。在许多情况下，估计值而非确切的值足够，因此已经提出了许多可显着降低记忆和计算要求的基数估计算法。其中，HyperLogLog在软件和硬件实现中都被广泛采用。最近已经研究了超置logog的安全性，表明攻击者可以创建一组元素，产生的基数估计值比集合的真正基数小得多。该集合可用于例如逃避使用HyperLogLog的检测系统。在本文中，从相反的角度考虑了超闸纸的安全性：攻击者希望创建一个小型集合，当插入超置槽上时，该集合会产生较大的基数估算。该集合可用于在使用HyperLogLog的检测系统中触发错误警报，但更有趣的是，它可以用来将访问访问到网站或在线广告的命中次数上。我们的分析表明，攻击者可以创建一个相当于超置logog实现中使用的寄存器数量的元素的集合，该寄存器数量产生任何任意的基数估算。这在Hyperloglog的两个商业实现中已得到验证：Presto和Redis。基于这些结果，我们还考虑了对超置log的保护免受这种攻击的影响。

Counting the number of distinct elements on a set is needed in many applications, for example to track the number of unique users in Internet services or the number of distinct flows on a network. In many cases, an estimate rather than the exact value is sufficient and thus many algorithms for cardinality estimation that significantly reduce the memory and computation requirements have been proposed. Among them, Hyperloglog has been widely adopted in both software and hardware implementations. The security of Hyperloglog has been recently studied showing that an attacker can create a set of elements that produces a cardinality estimate that is much smaller than the real cardinality of the set. This set can be used for example to evade detection systems that use Hyperloglog. In this paper, the security of Hyperloglog is considered from the opposite angle: the attacker wants to create a small set that when inserted on the Hyperloglog produces a large cardinality estimate. This set can be used to trigger false alarms in detection systems that use Hyperloglog but more interestingly, it can be potentially used to inflate the visits to websites or the number of hits of online advertisements. Our analysis shows that an attacker can create a set with a number of elements equal to the number of registers used in the Hyperloglog implementation that produces any arbitrary cardinality estimate. This has been validated in two commercial implementations of Hyperloglog: Presto and Redis. Based on those results, we also consider the protection of Hyperloglog against such an attack.

下载PDF全文

下载文献需遵守相关版权规定

论文标题