通过知识蒸馏而非自动入学的手语产生

论文标题

通过知识蒸馏而非自动入学的手语产生

Non-Autoregressive Sign Language Production via Knowledge Distillation

论文作者

Hwang, Eui Jun, Kim, Jung Ho, Cho, Suk Min, Park, Jong C.

论文摘要

手语制作（SLP）旨在将口语表达式转化为手语的相应语言，例如基于骨架的标志姿势或视频。现有的SLP型号是自动回旋（AR）或非自动回旋（NAR）。但是，AR-SLP模型在解码过程中遭受了回归对均值和误差传播的影响。基于NAR的模型NSLP-G在某种程度上解决了这些问题，但会带来其他问题。例如，它不考虑目标符号长度，并且会遭受虚假解码启动的影响。我们通过知识蒸馏（KD）提出了一种新型的NAR-SLP模型，以解决这些问题。首先，我们设计一个长度调节器来预测生成的符号姿势序列的末端。然后，我们采用KD，该KD从预训练的姿势编码器中提取空间语言特征，以减轻虚假解码的启动。广泛的实验表明，所提出的方法在特里切特的手势距离和反向翻译评估中都显着优于现有的SLP模型。

Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some extent but engenders other problems. For example, it does not consider target sign lengths and suffers from false decoding initiation. We propose a novel NAR-SLP model via Knowledge Distillation (KD) to address these problems. First, we devise a length regulator to predict the end of the generated sign pose sequence. We then adopt KD, which distills spatial-linguistic features from a pre-trained pose encoder to alleviate false decoding initiation. Extensive experiments show that the proposed approach significantly outperforms existing SLP models in both Frechet Gesture Distance and Back-Translation evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题