电视节目多模式内容向量的余弦相似性

论文标题

电视节目多模式内容向量的余弦相似性

Cosine Similarity of Multimodal Content Vectors for TV Programmes

论文作者

Nazir, Saba, Cagali, Taner, Newell, Chris, Sadrzadeh, Mehrnoosh

论文摘要

多模式信息源自多种来源：视听文件，文本描述和元数据。我们展示了如何使用向量来表示每个单个源编码的内容，如何通过中间和晚融合技术组合向量，以及如何计算内容之间的语义相似性。我们的矢量表示是由光谱特征和音频单词袋构建的，用于字幕的音频，LSI主题和DOC2VEC嵌入，以及用于元数据的分类特征。我们在BBC电视节目的数据集上实施模型，并评估融合表示形式以提供建议。晚期融合的相似性矩阵可显着提高建议的精度和多样性。

Multimodal information originates from a variety of sources: audiovisual files, textual descriptions, and metadata. We show how one can represent the content encoded by each individual source using vectors, how to combine the vectors via middle and late fusion techniques, and how to compute the semantic similarities between the contents. Our vectorial representations are built from spectral features and Bags of Audio Words, for audio, LSI topics and Doc2vec embeddings for subtitles, and the categorical features, for metadata. We implement our model on a dataset of BBC TV programmes and evaluate the fused representations to provide recommendations. The late fused similarity matrices significantly improve the precision and diversity of recommendations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题