基于平均算法在较小距离处进行痕量重建的局限性

论文标题

基于平均算法在较小距离处进行痕量重建的局限性

Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Distance

论文作者

Grigorescu, Elena, Sudan, Madhu, Zhu, Minshen

论文摘要

跟踪重建考虑了恢复未知字符串$ x \ in \ {0,1 \}^n $给定的任务，给定许多独立的“ traces”，即，通过随机和独立地删除$ x $的每个符号所获得的$ x $的子序列，并使用一些概率$ p $。恢复长度$ n $所需的痕迹数量的信息理论限制仍然未知。此限制本质上与确定$ x $和$ y $以及其中一个的痕迹所需的痕迹数量相同，其中一个字符串是源。最坏情况下，该问题最严重的算法类别是“基于平均”算法。这些是仅使用每个坐标的平均值的限制类别类别。在这项工作中，我们研究基于平均值的算法在小锤子或编辑距离处的字符串的局限性。我们表明，一方面，对于此类区分人来说，区分附近的锤子距离很“容易”。另一方面，我们表明，对于基于平均值的算法而言，在编辑距离附近的区分编辑距离是“难”的。在此过程中，我们还描述了与著名的Prouhet-Tarry-Escott（PTE）问题的联系，该问题显示出寻找明确难以固定的字符串的障碍：即，这样的字符串将暗示PTE问题的明确简短解决方案，这是一个众所周知的困难问题。此外，我们表明相反的情况也是正确的，因此，找到PTE问题的明确解决方案等同于寻找明确字符串的问题，而这些算法很难通过基于平均值的算法来呈现。我们的技术依赖于涉及仔细的三角估计的复杂分析论点，以及包括笛卡尔对真实性多项式符号规则的应用的代数技术。

Trace reconstruction considers the task of recovering an unknown string $x \in \{0,1\}^n$ given a number of independent "traces", i.e., subsequences of $x$ obtained by randomly and independently deleting every symbol of $x$ with some probability $p$. The information-theoretic limit of the number of traces needed to recover a string of length $n$ is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings $x$ and $y$ and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题