与演讲关系的遥远的填补

论文标题

与演讲关系的遥远的填补

Distant finetuning with discourse relations for stance classification

论文作者

Jin, Lifeng, Xu, Kun, Song, Linfeng, Yu, Dong

论文摘要

立场分类任务的方法，是理解辩论中论证和检测假新闻的重要任务，一直依赖于处理个人辩论主题的模型。在本文中，为了训练独立于主题的系统，我们提出了一种新方法，以从原始文本到Finetune的银标签提取数据，以示例分类。提取依赖于特定的话语关系信息，该信息作为提供立场信息的可靠且准确的来源。我们还提出了一个三阶段的训练框架，其中用于填充的数据中的嘈杂水平在不同阶段从最嘈杂到最小嘈杂的不同阶段下降。详细的实验表明，自动注释的数据集以及3阶段培训有助于提高立场分类中的模型性能。我们的方法在NLPCC 2021的立场分类轨道中的26个竞争团队中排名第一，共享了AI辩论者的任务论证文本理解，这证实了我们方法的有效性。

Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics. In this paper, in order to train a system independent from topics, we propose a new method to extract data with silver labels from raw text to finetune a model for stance classification. The extraction relies on specific discourse relation information, which is shown as a reliable and accurate source for providing stance information. We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages going from the most noisy to the least noisy. Detailed experiments show that the automatically annotated dataset as well as the 3-stage training help improve model performance in stance classification. Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater, which confirms the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题