培训多模式系统，用于分类多个目标

论文标题

培训多模式系统，用于分类多个目标

Training Multimodal Systems for Classification with Multiple Objectives

论文作者

Armitage, Jason, Thakur, Shramana, Tripathi, Rishi, Lehmann, Jens, Maleshkova, Maria

论文摘要

我们从各种各样的感官信息中了解世界。自动化系统缺乏这种能力，因为调查已集中在以单一形式提供的处理信息上。调整体系结构以从多种模式中学习会产生学习世界丰富表示形式的潜力 - 但是当前的多模式系统仅在单峰方法上提供边际改进。神经网络在训练过程中学习采样噪声，结果使看不见的数据的性能降低了。这项研究引入了第二个目标，该目标是通过变异推断学到的多模式融合过程。正则化方法是在内部训练环中实现的，以控制方差，并且随着添加其他神经元的层次，模块化结构稳定了性能。该框架在具有文本和视觉输入的多标签分类任务上进行了评估，以证明具有多种目标和概率方法的潜力，以降低差异和改善概括。

We learn about the world from a diverse range of sensory information. Automated systems lack this ability as investigation has centred on processing information presented in a single form. Adapting architectures to learn from multiple modalities creates the potential to learn rich representations of the world - but current multimodal systems only deliver marginal improvements on unimodal approaches. Neural networks learn sampling noise during training with the result that performance on unseen data is degraded. This research introduces a second objective over the multimodal fusion process learned with variational inference. Regularisation methods are implemented in the inner training loop to control variance and the modular structure stabilises performance as additional neurons are added to layers. This framework is evaluated on a multilabel classification task with textual and visual inputs to demonstrate the potential for multiple objectives and probabilistic methods to lower variance and improve generalisation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题