论文标题

一种用于查找最大共同子序列的快速随机算法

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

论文作者

Cao, Jin, Zhong, Dewei

论文摘要

在生物信息学,计算语言学和信息检索方面,找到$ l $多元字符串的常见子序列有许多应用。一个众所周知的结果指出,为$ l $ strings找到最长的常见子序列(LCS)是NP-HARD,例如,计算复杂性在$ L $中为指数。在本文中,我们开发了一种随机算法,称为{\ em tandom-mcs},用于查找多个字符串的最大常见子序列($ MCS $)的随机实例。如果将任何字符插入子序列不再产生共同的子序列,则常见的子序列是{\ em maxal}。 MCS的特殊情况是长度最长的LCS。我们显示我们的算法的复杂性是$ L $的线性,因此适用于大型$ L $。此外,我们研究了MC的单个实例的发生概率,并通过理论和实验研究证明了{\ em Random-MCS}的多个运行中最长的子序列通常会产生$ LCS $的解决方案。

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {\em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {\em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {\em Random-MCS} often yields a solution to $LCS$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源