MultiHiertt：多分层表格和文本数据的数值推理

论文标题

MultiHiertt：多分层表格和文本数据的数值推理

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data

论文作者

Zhao, Yilun, Li, Yunxiang, Li, Chenying, Zhang, Rui

论文摘要

关于包含文本和表格内容（例如财务报告）的混合数据的数值推理最近在NLP社区引起了很多关注。但是，混合数据上的现有问题回答（QA）基准仅包含每个文档中的一个平台，因此缺少多个层次表中多步数数值推理的示例。为了促进数据分析进度，我们构建了一个新的大规模基准Multihiertt，并具有QA对，而不是多层表格和文本数据。 Multihiertt是由大量财务报告构建的，具有以下独特的特征：1）每个文档包含多个表和更长的非结构化文本； 2）包含的大多数表是分层的； 3）每个问题所需的推理过程比现有基准比现有基准更为复杂和具有挑战性； 4）提供推理过程和支持事实的细粒注释以揭示复杂的数值推理。我们进一步介绍了一种称为MT2NET的新型质量检查模型，该模型首先采用事实来检索从表和文本中提取相关的支持事实，然后使用推理模块对检索到的事实进行符号推理。我们对各种基线进行全面的实验。实验结果表明，Multihiertt对现有基线的挑战提出了巨大的挑战，后者远远落后于人类专家的表现。该数据集和代码可在https://github.com/psunlpgroup/multihiertt上公开获取。

Numerical reasoning over hybrid data containing both textual and tabular content (e.g., financial reports) has recently attracted much attention in the NLP community. However, existing question answering (QA) benchmarks over hybrid data only include a single flat table in each document and thus lack examples of multi-step numerical reasoning across multiple hierarchical tables. To facilitate data analytical progress, we construct a new large-scale benchmark, MultiHiertt, with QA pairs over Multi Hierarchical Tabular and Textual data. MultiHiertt is built from a wealth of financial reports and has the following unique characteristics: 1) each document contain multiple tables and longer unstructured texts; 2) most of tables contained are hierarchical; 3) the reasoning process required for each question is more complex and challenging than existing benchmarks; and 4) fine-grained annotations of reasoning processes and supporting facts are provided to reveal complex numerical reasoning. We further introduce a novel QA model termed MT2Net, which first applies facts retrieving to extract relevant supporting facts from both tables and text and then uses a reasoning module to perform symbolic reasoning over retrieved facts. We conduct comprehensive experiments on various baselines. The experimental results show that MultiHiertt presents a strong challenge for existing baselines whose results lag far behind the performance of human experts. The dataset and code are publicly available at https://github.com/psunlpgroup/MultiHiertt.

下载PDF全文

下载文献需遵守相关版权规定

论文标题