论文标题

韵律有多少帮助转弯?使用语音活动投影模型进行调查

How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models

论文作者

Ekstedt, Erik, Skantze, Gabriel

论文摘要

转弯是人类交流的基本方面,可以描述为转弯的能力,投射即将发生的转弯以及在整个对话中在适当位置提供回音的能力。在这项工作中,我们使用最近提出的语音活动投影模型调查了韵律在转弯中的作用,该模型逐渐模拟了对话者的即将到来的言语活动,以一种自我监督的方式,而无需依赖于转弯事件的明确注释,或者是对韵律特征的明确模型的明确注释。通过操纵语音信号,我们研究了这些模型如何隐含使用韵律信息。我们表明,这些系统学会学会利用语音的各种韵律方面,既可以在长期对话的总定量指标上以及专门依赖韵律的单一话语上进行的。

Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation. In this work, we investigate the role of prosody in turn-taking using the recently proposed Voice Activity Projection model, which incrementally models the upcoming speech activity of the interlocutors in a self-supervised manner, without relying on explicit annotation of turn-taking events, or the explicit modeling of prosodic features. Through manipulation of the speech signal, we investigate how these models implicitly utilize prosodic information. We show that these systems learn to utilize various prosodic aspects of speech both on aggregate quantitative metrics of long-form conversations and on single utterances specifically designed to depend on prosody.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源