从子字符串的广义独特重建

论文标题

从子字符串的广义独特重建

Generalized Unique Reconstruction from Substrings

论文作者

Yehezkeally, Yonatan, Bar-Lev, Daniella, Marcovich, Sagi, Yaakobi, Eitan

论文摘要

本文介绍了一个新的重建代码系列，该家族是由DNA数据存储和测序中的应用中的。在这种应用中，通过阅读其子字节的某些子集对DNA链进行测序。虽然先前的作品考虑了两个极端情况，其中读取所有预定义长度的子字符串或对单弦案例没有重叠的读数，但该工作研究了该范式的两个扩展。第一个扩展名考虑了以某些给定的最小重叠读取连续子字符串的设置。首先，在可实现的代码速率上提供了上限，以保证独特的重建。然后，提出了渐近地符合该上限的有效构造。在第二次扩展中，我们研究了将多个字符串一起重建的设置。考虑到字符串的数量及其长度，我们首先在读取子字符串的长度$ \ ell $上得出了一个下限，这对于存在非变化率的多链重建代码所必需的。然后，我们提出了此类代码的两个结构，并表明他们的费率接近$ \ ell $的值，渐近地表现得像下限。

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length $\ell$ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of $\ell$ that asymptotically behave like the lower bound.

下载PDF全文

下载文献需遵守相关版权规定

论文标题