动作-GPT：利用大规模语言模型进行改进和广泛的动作生成

论文标题

动作-GPT：利用大规模语言模型进行改进和广泛的动作生成

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

论文作者

Kalakonda, Sai Shashank, Maheshwari, Shubh, Sarvadevabhatla, Ravi Kiran

论文摘要

我们介绍了Action-GPT，这是一个将大型语言模型（LLMS）纳入基于文本的动作生成模型中的插件框架。当前运动捕获数据集中的动作短语包含最小和点的信息。通过精心制作LLM的提示，我们对动作产生了更丰富，细粒度的描述。我们表明，利用这些详细描述而不是原始的动作短语会导致文本和运动空间更好地对齐。我们引入了一种通用方法，与随机（基于VAE）和确定性（例如MotionClip）文本模型兼容。此外，该方法可以利用多个文本描述。我们的实验表明（i）（i）综合运动质量的明显定性和定量改进，（ii）使用多个LLM生成的描述的益处，（iii）及时功能的适用性以及（iv）所提出方法的零发产生能力。项目页面：https：//actiongpt.github.io

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Project page: https://actiongpt.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题