论文标题
迈向可解释的作者验证的一步
A Step Towards Interpretable Authorship Verification
论文作者
论文摘要
在数字文本取证领域已经研究了多年的一个核心问题是同一作者是否撰写了两个文件。作者身份验证(AV)是该领域的研究部门,涉及该问题。多年来,在AV背景下的研究活动稳步增加,这导致了各种试图解决这个问题的方法。但是,其中许多方法都利用了与文档主题相关或影响的功能。因此,可能会意外的是,他们的验证结果不是基于写作风格(AV的实际重点),而是基于文档的主题。为了解决这个问题,我们提出了一种替代的AV方法,该方法仅在其分类决策中仅考虑主题不可吻合的特征。此外,我们提出了一种事后解释方法,该方法允许了解哪些特定特征有助于预测所提出的AV方法。为了评估我们的AV方法的性能,我们将其与四个具有挑战性的数据集的十个竞争基线(包括当前的最新水平)进行了比较。结果表明,我们的方法在两种情况下的表现都优于所有基准(最高精度为84%),而在其他两种情况下,它的性能和最强基线的性能。
A central problem that has been researched for many years in the field of digital text forensics is the question whether two documents were written by the same author. Authorship verification (AV) is a research branch in this field that deals with this question. Over the years, research activities in the context of AV have steadily increased, which has led to a variety of approaches trying to solve this problem. Many of these approaches, however, make use of features that are related to or influenced by the topic of the documents. Therefore, it may accidentally happen that their verification results are based not on the writing style (the actual focus of AV), but on the topic of the documents. To address this problem, we propose an alternative AV approach that considers only topic-agnostic features in its classification decision. In addition, we present a post-hoc interpretation method that allows to understand which particular features have contributed to the prediction of the proposed AV method. To evaluate the performance of our AV method, we compared it with ten competing baselines (including the current state of the art) on four challenging data sets. The results show that our approach outperforms all baselines in two cases (with a maximum accuracy of 84%), while in the other two cases it performs as well as the strongest baseline.