学习使用视觉扎根的语音识别单词

论文标题

学习使用视觉扎根的语音识别单词

Learning to Recognise Words using Visually Grounded Speech

论文作者

Scholten, Sebastiaan, Merkx, Danny, Scharenborg, Odette

论文摘要

我们在视觉扎根的语音模型中调查了单词识别。该模型已通过成对的图像和口语字幕进行了训练，以创建视觉接地的嵌入，可用于语音以图像检索，反之亦然。我们调查是否可以通过嵌入孤立的单词并使用它们来检索其视觉引用者的图像来识别单词。我们使用门控范式研究单词识别的时间顺序，并进行统计分析，以查看人类语音处理中众所周知的单词竞争效应是否影响单词识别。我们的实验表明该模型能够识别单词，并且门控范式表明，单词也可以从部分输入中识别，并且识别受到初始同类词的单词竞争的负面影响。

We investigated word recognition in a Visually Grounded Speech model. The model has been trained on pairs of images and spoken captions to create visually grounded embeddings which can be used for speech to image retrieval and vice versa. We investigate whether such a model can be used to recognise words by embedding isolated words and using them to retrieve images of their visual referents. We investigate the time-course of word recognition using a gating paradigm and perform a statistical analysis to see whether well known word competition effects in human speech processing influence word recognition. Our experiments show that the model is able to recognise words, and the gating paradigm reveals that words can be recognised from partial input as well and that recognition is negatively influenced by word competition from the word initial cohort.

下载PDF全文

下载文献需遵守相关版权规定

论文标题