关于多模式深度学习方法和应用的综述

论文标题

关于多模式深度学习方法和应用的综述

A Review on Methods and Applications in Multimodal Deep Learning

论文作者

Summaira, Jabeen, Li, Xi, Shoib, Amin Muhammad, Abdul, Jabbar

论文摘要

深度学习已经实施了广泛的应用，并且近年来变得越来越受欢迎。多模式深度学习（MMDL）的目标是创建可以使用各种模式来处理和链接信息的模型。尽管为单峰学习做出了广泛的发展，但它仍然无法涵盖人类学习的所有方面。当各种感官参与信息处理时，多模式学习有助于更好地理解和分析。本文着重于多种类型的方式，即图像，视频，文本，音频，身体手势，面部表情和生理信号。已经提供了对过去五年（2017年至2021年）在多模式深度学习应用中对基线方法的详细分析和对最近进步的深入研究。提出了各种多模式深度学习方法的细粒分类法，并更深入地详细介绍了不同的应用。最后，对于每个领域以及可能的未来研究方向，分别突出了主要问题。

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the last five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth. Lastly, main issues are highlighted separately for each domain, along with their possible future research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题