论文标题

评论机器人:基于知识综合的可解释的纸质评论生成

ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis

论文作者

Wang, Qingyun, Zeng, Qi, Huang, Lifu, Knight, Kevin, Ji, Heng, Rajani, Nazneen Fatema

论文摘要

为了帮助人类的审查过程,我们构建了一个新颖的评论机器人,以自动分配审核分数并为多种类别(例如新颖性和有意义的比较)写评论。良好的评论需要知识渊博,即评论应该具有建设性和信息性,以帮助改善论文;并可以通过提供详细证据来解释。审查伙伴通过三个步骤实现这些目标:(1)我们执行特定于领域的信息提取,以构建来自所审查的目标论文的知识图(kg),目标论文引用的论文中的相关工作kg以及来自该领域的先前论文的大量kg。 (2)通过比较这三公斤,我们可以预测评论评分和详细的结构化知识作为每个审查类别的证据。 (3)我们仔细地将人类审查句子选择为模板,并应用这些模板将审查分数和证据转换为自然语言评论。实验结果表明,我们的评论得分预测指标达到71.4%-100%的精度。域专家的人类评估表明,审查机器人产生的评论中有41.7%-70.5%是有效的,建设性的,并且比人类写的评论在20%的时间内更好。因此,ReviewRobot可以担任纸审稿人,程序主持人和作者的助手。

To assist human review process, we build a novel ReviewRobot to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison. A good review needs to be knowledgeable, namely that the comments should be constructive and informative to help improve the paper; and explainable by providing detailed evidence. ReviewRobot achieves these goals via three steps: (1) We perform domain-specific Information Extraction to construct a knowledge graph (KG) from the target paper under review, a related work KG from the papers cited by the target paper, and a background KG from a large collection of previous papers in the domain. (2) By comparing these three KGs, we predict a review score and detailed structured knowledge as evidence for each review category. (3) We carefully select and generalize human review sentences into templates, and apply these templates to transform the review scores and evidence into natural language comments. Experimental results show that our review score predictor reaches 71.4%-100% accuracy. Human assessment by domain experts shows that 41.7%-70.5% of the comments generated by ReviewRobot are valid and constructive, and better than human-written ones for 20% of the time. Thus, ReviewRobot can serve as an assistant for paper reviewers, program chairs and authors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源