论文标题
CPTAM:选区解析树聚合法
CPTAM: Constituency Parse Tree Aggregation Method
论文作者
论文摘要
各种自然语言处理任务采用选区解析,根据短语结构语法了解句子的句法结构。提出了许多最先进的选区解析器,但它们可能会为同一句子提供不同的结果,尤其是对于培训领域以外的语料库。本文采用了真相发现思想,通过在没有地面真理的情况下估算它们的可靠性来汇总不同解析器的选区解析树。我们的目标是始终获得高质量的汇总选区解析树。我们以两个步骤(结构聚合和组成标签聚合)为组成部分解析树的聚合问题。具体而言,我们通过最大程度地降低了罗宾逊 - 范特(RF)距离的加权总和,这是两棵树之间的经典对称距离度量。广泛的实验是在不同语言和域的基准数据集上进行的。实验结果表明,我们的方法CPTAM优于最先进的聚合基线。我们还证明,在没有地面真理的情况下,由CPTAM估计的权重可以充分评估选区解析器。
Diverse Natural Language Processing tasks employ constituency parsing to understand the syntactic structure of a sentence according to a phrase structure grammar. Many state-of-the-art constituency parsers are proposed, but they may provide different results for the same sentences, especially for corpora outside their training domains. This paper adopts the truth discovery idea to aggregate constituency parse trees from different parsers by estimating their reliability in the absence of ground truth. Our goal is to consistently obtain high-quality aggregated constituency parse trees. We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation. Specifically, we propose the first truth discovery solution for tree structures by minimizing the weighted sum of Robinson-Foulds (RF) distances, a classic symmetric distance metric between two trees. Extensive experiments are conducted on benchmark datasets in different languages and domains. The experimental results show that our method, CPTAM, outperforms the state-of-the-art aggregation baselines. We also demonstrate that the weights estimated by CPTAM can adequately evaluate constituency parsers in the absence of ground truth.