论文标题
请注意根源:解码依赖性解析
Please Mind the Root: Decoding Arborescences for Dependency Parsing
论文作者
论文摘要
NLP社区利用依赖树和跨越树之间的连接来训练并解码基于图的依赖性解析器。但是,NLP文献错过了两个结构之间的重要区别:只有一个边缘可以从依赖树中的根中散发出来。我们分析了来自通用依赖性树库中许多语言的最先进解析器的输出:尽管这些解析器通常能够得知违反约束的树木应分配较低的概率,但由于训练集的尺寸减小,因此毫无疑问地做到这一点的能力。实际上,我们观察到的最坏的约束侵入率为24%。先前的工作提出了一种效率低下的算法来强制执行约束,该算法为解码运行时增加了n倍。由于Gabow and Tarjan(1984),我们适应了算法,以依赖性解析,该算法满足了约束而不损害原始运行时的约束。
The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers. However, the NLP literature has missed an important difference between the two structures: only one edge may emanate from the root in a dependency tree. We analyzed the output of state-of-the-art parsers on many languages from the Universal Dependency Treebank: although these parsers are often able to learn that trees which violate the constraint should be assigned lower probabilities, their ability to do so unsurprisingly de-grades as the size of the training set decreases. In fact, the worst constraint-violation rate we observe is 24%. Prior work has proposed an inefficient algorithm to enforce the constraint, which adds a factor of n to the decoding runtime. We adapt an algorithm due to Gabow and Tarjan (1984) to dependency parsing, which satisfies the constraint without compromising the original runtime.