论文标题
看到是知道的!基于事实的视觉问答使用知识图嵌入
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings
论文作者
论文摘要
基于事实的视觉问题回答(FVQA)是VQA的挑战性变体,要求QA系统将来自多样化知识图(KG)的事实包括在其推理过程中以产生答案。已知大型公斤,尤其是常识性公斤不完整,即,并非所有不存在的事实总是不正确的。因此,在质量检查中,能够为质量保险公司推理不完整的kg是现实世界中的关键要求,在文献中尚未广泛解决。我们开发了一种新颖的质量检查架构,使我们能够对不完整的kg进行推论,这是当前FVQA最先进的方法(SOTA)方法由于对事实检索的严重依赖而缺乏的方法。我们使用kg嵌入,这是一种用于kg完成的技术,用于FVQA的下游任务。我们还采用了一种新的图像表示技术,我们称为“图像 - 知识”来启用此功能,并在QA期间使用一个简单的一步共同机制来处理文本和图像。我们的FVQA体系结构在推理期间为O(M),而不是现有的FVQA SOTA方法,即O(n log n),其中m =顶点的数量,n = edges = o(m^2)。 kg嵌入显示可将互补的信息保存到单词嵌入:两种指标的组合允许在标准答案检索任务中与SOTA方法相当的性能,并且在拟议的缺失 - 边缘推理任务中明显更好(26%的绝对)。
Fact-based Visual Question Answering (FVQA), a challenging variant of VQA, requires a QA-system to include facts from a diverse knowledge graph (KG) in its reasoning process to produce an answer. Large KGs, especially common-sense KGs, are known to be incomplete, i.e., not all non-existent facts are always incorrect. Therefore, being able to reason over incomplete KGs for QA is a critical requirement in real-world applications that has not been addressed extensively in the literature. We develop a novel QA architecture that allows us to reason over incomplete KGs, something current FVQA state-of-the-art (SOTA) approaches lack due to their critical reliance on fact retrieval. We use KG Embeddings, a technique widely used for KG completion, for the downstream task of FVQA. We also employ a new image representation technique we call 'Image-as-Knowledge' to enable this capability, alongside a simple one-step CoAttention mechanism to attend to text and image during QA. Our FVQA architecture is faster during inference time, being O(m), as opposed to existing FVQA SOTA methods which are O(N log N), where m = number of vertices, N = number of edges = O(m^2). KG embeddings are shown to hold complementary information to word embeddings: a combination of both metrics permits performance comparable to SOTA methods in the standard answer retrieval task, and significantly better (26% absolute) in the proposed missing-edge reasoning task.