关于自我监管的多模式表示学习和基础模型的调查

论文标题

关于自我监管的多模式表示学习和基础模型的调查

Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

论文作者

Thapa, Sushil

论文摘要

近年来，深度学习一直是日益增长的兴趣的主题。具体而言，一种称为多模式学习的特定类型显示了解决语言，视觉，音频等领域中广泛的问题的巨大希望。进一步改善这一点的一个有希望的研究方向是，在互联网上存在的大型数据集的帮助下，学习了高维世界的丰富而强大的低维数据表示。由于它有可能避免注释大规模数据集的成本，因此近年来，自我监督的学习是这项任务的事实上的标准。本文总结了一些具有里程碑意义的研究论文，这些论文是直接或间接负责的，以建立当今代表学的多模式自学学习的基础。本文介绍了过去几年的代表性学习的发展，以及如何将它们合并为以后获得多模式的代理。

Deep learning has been the subject of growing interest in recent years. Specifically, a specific type called Multimodal learning has shown great promise for solving a wide range of problems in domains such as language, vision, audio, etc. One promising research direction to improve this further has been learning rich and robust low-dimensional data representation of the high-dimensional world with the help of large-scale datasets present on the internet. Because of its potential to avoid the cost of annotating large-scale datasets, self-supervised learning has been the de facto standard for this task in recent years. This paper summarizes some of the landmark research papers that are directly or indirectly responsible to build the foundation of multimodal self-supervised learning of representation today. The paper goes over the development of representation learning over the last few years for each modality and how they were combined to get a multimodal agent later.

下载PDF全文

下载文献需遵守相关版权规定

论文标题