论文标题
量化无环依赖的损失
Quantifying the Loss of Acyclic Join Dependencies
论文作者
论文摘要
无环方案具有数据库设计,加速查询并减少空间要求的已知好处。如果连接与模式相关的投影会导致原始通用关系,则无环依赖依赖关系(AJD)对于普遍关系是无损的。 AJD所带来的直观和标准的损失度量是无环连接产生的冗余元组的数量。最近的工作表明,AJD的丢失也可以以信息理论措施为特征。由于自动将无环模式与普遍关系拟合的问题所激发,我们研究了这两种损失表征之间的联系。我们首先表明使用KL-Divergence的概念捕获了AJD的损失。然后,我们证明KL-Divergence可用于绑定冗余元素的数量。我们证明了对冗余元素百分比的确定性下限。对于上限,我们提出了一个随机数据库模型,并在冗余元素的百分比上建立了一个高概率,该模型与大型数据库的下限一致。
Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a random database model, and establish a high probability bound on the percentage of redundant tuples, which coincides with the lower bound for large databases.