论文标题

通过几次学习和填充来对齐岩浆

Aligning MAGMA by Few-Shot Learning and Finetuning

论文作者

Layoun, Jean-Charles, Roger, Alexis, Rish, Irina

论文摘要

视觉建模的目的是允许模型与视觉输入联系起来。本文的目的是评估和对齐视觉语言模型(VLM)通过基于适配器的芬太尼(MAGMA)具有人类价值观,称为生成模型的多模式增强。岩浆是能够图像字幕和视觉提问的VLM。我们将在三种不同的情况下评估其对齐方式。首先,我们通过拥抱脸提供的检查站来评估岩浆的开箱即用对齐。然后,我们衡量是否少得多学习可以改善结果。最后,我们以对齐的示例来确定模型并评估其行为。

The goal of vision-language modeling is to allow models to tie language understanding with visual inputs. The aim of this paper is to evaluate and align the Visual Language Model (VLM) called Multimodal Augmentation of Generative Models through Adapter-based finetuning (MAGMA) with human values. MAGMA is a VLM that is capable of image captioning and visual question-answering. We will evaluate its alignment in three different scenarios. To begin, we assess MAGMA's out-of-the-box alignment through the checkpoint provided by Hugging Face. Then, we measure if few-shot learning manages to improve the results. Finally, we finetune the model on aligned examples and evaluate its behavior.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源