涵盖关系加入

论文标题

涵盖关系加入

Covering the Relational Join

论文作者

Li, Shi, Jayaraman, Sai Vikneshwar Mani, Rudra, Atri

论文摘要

在本文中，我们启动了我们所谓的加入覆盖问题的理论研究。我们获得了$ n $属性上的自然加入查询实例$ q $，$ m $ resptions $（r_i）_ {i \ in [m]} $。令$ j_ {q} = \ \ join_ {i = 1}^m r_i $表示$ q $的加入输出。除$ q $外，还为我们提供了一个参数$δ：1 \leδ\ le n $，我们的目标是计算最小的子集$ \ Mathcal {t} _ {q，Δ} \ subseteq j_ {q} $ in $ j_ {q} $ in hamming use hamming usex $δ- $ \ MATHCAL {T} _ {Q，δ} $。连接覆盖问题既捕获了数据库理论的自然连接，又捕获了用覆盖半径$δ-1 $从编码理论（特殊情况）构建覆盖代码。我们考虑加入覆盖问题的组合版本，我们的目标是确定最差的$ | \ nathcal {t} _ {q，δ} | $，以$ q $的结构和$Δ$的value。一种显而易见的方法，一种对上限$ | \ Mathcal {t} _ {q，δ} | $的方法是从编码理论中利用远距离的属性（锤距），并将其与自然加入的输出大小（AGM绑定的AGM绑定）（Atserias andserias andserias，Grohe和Marx [serhe and Marx [smarx and Marx [siam bundon''的最差案例相结合。令人惊讶的是，即使在输入关系具有两种情况的情况下，这种方法也不紧张。取而代之的是，我们表明，使用基于多肌的基于Abo Khamis，NGO和Suciu [Pods'17]代替AGM结合，可以在$ | \ Mathcal {t} _ {t} _ {q，δ} | $上对两种情况下的$ | \ Mathcal {t} _ {t} _ {q，t} _ {我们使用众所周知的错误校正代码（例如Reed-Solomon codes codes copers of $ | \ Mathcal {t} _ {q，δ} | $证明了下限。我们可以将两种情况的结果扩展到一般性差异，并在上限和下限之间具有多项式差距。

In this paper, we initiate a theoretical study of what we call the join covering problem. We are given a natural join query instance $Q$ on $n$ attributes and $m$ relations $(R_i)_{i \in [m]}$. Let $J_{Q} = \ \Join_{i=1}^m R_i$ denote the join output of $Q$. In addition to $Q$, we are given a parameter $Δ: 1\le Δ\le n$ and our goal is to compute the smallest subset $\mathcal{T}_{Q, Δ} \subseteq J_{Q}$ such that every tuple in $J_{Q}$ is within Hamming distance $Δ- 1$ from some tuple in $\mathcal{T}_{Q, Δ}$. The join covering problem captures both computing the natural join from database theory and constructing a covering code with covering radius $Δ- 1$ from coding theory, as special cases. We consider the combinatorial version of the join covering problem, where our goal is to determine the worst-case $|\mathcal{T}_{Q, Δ}|$ in terms of the structure of $Q$ and value of $Δ$. One obvious approach to upper bound $|\mathcal{T}_{Q, Δ}|$ is to exploit a distance property (of Hamming distance) from coding theory and combine it with the worst-case bounds on output size of natural joins (AGM bound hereon) due to Atserias, Grohe and Marx [SIAM J. of Computing'13]. Somewhat surprisingly, this approach is not tight even for the case when the input relations have arity at most two. Instead, we show that using the polymatroid degree-based bound of Abo Khamis, Ngo and Suciu [PODS'17] in place of the AGM bound gives us a tight bound (up to constant factors) on the $|\mathcal{T}_{Q, Δ}|$ for the arity two case. We prove lower bounds for $|\mathcal{T}_{Q, Δ}|$ using well-known classes of error-correcting codes e.g, Reed-Solomon codes. We can extend our results for the arity two case to general arity with a polynomial gap between our upper and lower bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题