论文标题
超平面更新查询的等效不变的代数出处
Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries
论文作者
论文摘要
来源跟踪的代数方法,起源于绿色ET的半级模型。 Al已被证明是处理元数据的抽象方法。换向半度性被证明是连接查询结合的“正确”代数结构,因为它的使用允许在某些预期的查询等效公理中出处是不变的。 在本文中,我们介绍了第一个(据我们所知)代数的出处模型,对于更新查询的片段,这是集合等效的不变性。我们关注的片段是超平面查询,以前在多个工作中研究。我们的代数出处结构和相应的出处感知语义基于卡拉贝格和Vianu的声音和完整的公理化。我们证明,我们的构建可以指导针对不同应用的具体出处模型实例的设计。我们进一步研究了超平面更新查询的有效生成和存储。我们表明,幼稚的算法可以导致指数较大的出处表达,但通过提出正常形式来解决这一问题,我们显示的形式可以在查询评估的同时有效地计算出来。我们通过实验研究解决方案的性能,并证明其可扩展性和实用性,尤其是我们正常形式表示的有效性。
The algebraic approach for provenance tracking, originating in the semiring model of Green et. al, has proven useful as an abstract way of handling metadata. Commutative Semirings were shown to be the "correct" algebraic structure for Union of Conjunctive Queries, in the sense that its use allows provenance to be invariant under certain expected query equivalence axioms. In this paper we present the first (to our knowledge) algebraic provenance model, for a fragment of update queries, that is invariant under set equivalence. The fragment that we focus on is that of hyperplane queries, previously studied in multiple lines of work. Our algebraic provenance structure and corresponding provenance-aware semantics are based on the sound and complete axiomatization of Karabeg and Vianu. We demonstrate that our construction can guide the design of concrete provenance model instances for different applications. We further study the efficient generation and storage of provenance for hyperplane update queries. We show that a naive algorithm can lead to an exponentially large provenance expression, but remedy this by presenting a normal form which we show may be efficiently computed alongside query evaluation. We experimentally study the performance of our solution and demonstrate its scalability and usefulness, and in particular the effectiveness of our normal form representation.