论文标题
用于合成DNA中图像存储的限制的Shannon-Fano熵编码器
A constrained Shannon-Fano entropy coder for image storage in synthetic DNA
论文作者
论文摘要
在过去的几年中,对数据存储的需求呈指数增长,面临越来越多的挑战。它代表的能源成本也在增加,存储硬件的可用性无法遵循存储需求的趋势。常规存储媒体的寿命短 - 10到20年 - 迫使硬件重复,并使情况恶化。大多数存储需求涉及“冷”数据,数据很少访问,但必须长时间保存。合成DNA的编码能力及其持久性(数百年)使其成为“冷”数据的替代存储媒体的认真候选者。在本文中,我们提出了一种可变长度的编码算法,该算法适用于DNA数据存储,并提高了性能。所提出的算法基于修改的香农 - 法诺代码,该代码尊重合成化学施加的某些生物化学约束。我们已经将此代码插入了适合DNA图像存储的JPEG压缩算法中,并且与最新的解决方案相比,我们强调了压缩比的改善范围从0.5到每个核苷酸2位,而不会影响重建质量。
The exponentially increasing demand for data storage has been facing more and more challenges during the past years. The energy costs that it represents are also increasing, and the availability of the storage hardware is not able to follow the storage demand's trend. The short lifespan of conventional storage media -- 10 to 20 years - forces the duplication of the hardware and worsens the situation. The majority of this storage demand concerns "cold" data, data very rarely accessed but that has to be kept for long periods of time. The coding abilities of synthetic DNA, and its long durability (several hundred years), make it a serious candidate as an alternative storage media for "cold" data. In this paper, we propose a variable-length coding algorithm adapted to DNA data storage with improved performance. The proposed algorithm is based on a modified Shannon-Fano code that respects some biochemichal constraints imposed by the synthesis chemistry. We have inserted this code in a JPEG compression algorithm adapted to DNA image storage and we highlighted an improvement of the compression ratio ranging from 0.5 up to 2 bits per nucleotide compared to the state-of-the-art solution, without affecting the reconstruction quality.