论文标题
NMFS感知手语识别的全局本地增强网络
Global-local Enhancement Network for NMFs-aware Sign Language Recognition
论文作者
论文摘要
手语识别(SLR)是一个具有挑战性的问题,涉及复杂的手动特征,即手势和精细的非手动特征(NMFS),即面部表达,口腔形状等。尽管手动特征是主导的,非手动特征,在表达符号词的表达中也起着重要的作用。具体而言,许多符号单词由于非手动特征而传达出不同的含义,即使它们具有相同的手势。这种歧义在识别标志单词的识别中引入了巨大的挑战。为了解决上述问题,我们提出了一个简单而有效的体系结构,称为全部本地增强网络(GLE-NET),其中包括两个相互促进的溪流,涉及SLR的不同关键方面。在这两个流中,一个人捕获了全球上下文的关系,而另一个流则捕获了歧视性的细颗粒提示。此外,由于缺乏明确关注此类功能的数据集,我们介绍了第一个非手动触觉的孤立的中文手语数据集〜(NMFS-CSL),总词汇大小为1,067个符号单词,在日常生活中。 NMFS-CSL和SLR500数据集的广泛实验证明了我们方法的有效性。
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i.e., hand gestures, and fine-grained non-manual features (NMFs), i.e., facial expression, mouth shapes, etc. Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net), including two mutually promoted streams towards different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of features, we introduce the first non-manual-features-aware isolated Chinese sign language dataset~(NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.