VISQOL V3：开源生产准备的目标语音和音频指标

论文标题

VISQOL V3：开源生产准备的目标语音和音频指标

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

论文作者

Chinen, Michael, Lim, Felicia S. C., Skoglund, Jan, Gureev, Nikita, O'Gorman, Feargus, Hines, Andrew

论文摘要

使用多种方法可以估计音频和语音中的感知质量。 VISQOL和VISQOLAUDIO（分别用于语音和音频）的合并V3发行版在设计和使用方面对先前版本进行了改进。作为开源C ++库或带有允许许可的二进制文件，Visqol现在可以将研究环境以外部署到生产使用情况下。 Google内部生产团队的反馈有助于改善该新版本，并显示出最适用的案例，并强调限制。为了评估目的，将新模型针对现实世界数据进行基准测试。讨论了未来工作的趋势和方向。

Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production usage. The feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations. The new model is benchmarked against real-world data for evaluation purposes. The trends and direction of future work is discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题