Carets：VQA的一致性和鲁棒性评估套件

论文标题

Carets：VQA的一致性和鲁棒性评估套件

CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

论文作者

Jimenez, Carlos E., Russakovsky, Olga, Narasimhan, Karthik

论文摘要

我们介绍了Carets，这是一个系统的测试套件，可通过一系列六个细粒度测试来衡量现代VQA模型的一致性和鲁棒性。与现有的VQA测试集相反，Carets具有平衡的问题生成，以创建成对的实例来测试模型，每对都集中在特定的功能上，例如重新启动，逻辑对称性或图像混淆。我们评估了六个现代VQA系统在CARET上，并确定了模型理解中的几个可行的弱点，尤其是诸如否定，脱节或超核不变性之类的概念。有趣的是，即使是最复杂的模型，也对诸如交换术语顺序或更改问题中提到的答案选择的数量敏感。我们释放脑袋用作评估多模型模型鲁棒性的可扩展工具。

We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We evaluate six modern VQA systems on CARETS and identify several actionable weaknesses in model comprehension, especially with concepts such as negation, disjunction, or hypernym invariance. Interestingly, even the most sophisticated models are sensitive to aspects such as swapping the order of terms in a conjunction or varying the number of answer choices mentioned in the question. We release CARETS to be used as an extensible tool for evaluating multi-modal model robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题