部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

The xmuspeech system for multi-channel multi-party meeting transcription challenge

论文作者

Wang, Jie, Liu, Yuji, Wang, Binling, Zhi, Yiming, Li1, Song, Xia, Shipeng, Zhang, Jiayang, Li1, Lin, Hong, Qingyang, Tong, Feng

论文摘要

本文介绍了Xmuspeech团队为多渠道多方会议转录挑战（M2MET）开发的系统。对于说话者诊断任务，我们提出了一个多通道扬声器诊断系统，该系统通过到达差（DOA）技术获得说话者的空间信息。扬声器空间嵌入是由X-Vector生成的，S-vector从过滤器和符合光束形成（FSB）中得出的S-矢量，这使得嵌入更强大。具体而言，我们提出了一种新型的多通道序列到序列神经网络结构，称为歧视性多流神经网络（DMSNET），该构建由注意力滤波器和-SUM块（AFSB）和构象异构体编码器组成。我们探索DMSNET，以解决多频道音频上的重叠语音问题。与基于LSTM的OSD模块相比，我们的检测错误率（DITER）降低了10.1％。通过执行基于DMSNET的OSD模块，基于聚类的诊断系统的DER可显着降低13.44％至7.63％。我们的最佳融合系统在评估集和测试集上实现了诊断错误率（DER）的7.09％和9.80％。

This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-and-Sum Beamforming (FSB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named Discriminative Multi-stream Neural Network (DMSNet) which consists of Attention Filter-and-Sum block (AFSB) and Conformer encoder. We explore DMSNet to address overlapped speech problem on multi-channel audio. Compared with LSTM based OSD module, we achieve a decreases of 10.1% in Detection Error Rate(DetER). By performing DMSNet based OSD module, the DER of cluster-based diarization system decrease significantly form 13.44% to 7.63%. Our best fusion system achieves 7.09% and 9.80% of the diarization error rate (DER) on evaluation set and test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题