神经颗粒声合成

论文标题

神经颗粒声合成

Neural Granular Sound Synthesis

论文作者

Bitton, Adrien, Esling, Philippe, Harada, Tatsuya

论文摘要

颗粒声合成是一种基于小波形窗口的重新安排序列的流行音频生成技术。为了控制合成，通过一组声学描述符分析给定语料库中的所有晶粒。这提供了反映谷物中某种形式的局部相似性的表示。但是，这种谷物空间的质量受描述符的限制。它的遍历并不是信号的连续可逆，也不会引起任何结构化的时间。我们证明，生成神经网络可以实施颗粒状合成，同时减轻其大多数缺点。我们通过用变异自动编码器学习的概率潜在空间有效地替换了其音频描述符基础。在这种情况下，学习的谷物空间是可逆的，这意味着我们可以在穿越其尺寸时不断综合声音。这也意味着原始谷物不是用于合成的。我们方法的另一个主要优点是通过训练与布置的谷物序列相比，通过训练高级的时间嵌入来学习该潜在空间内的结构化路径。该模型可以应用于多种类型的库，包括俯仰的音符或未触摸的鼓和环境噪音。我们报告了关于常见颗粒合成过程以及新颖的实验，例如条件采样和变形。

Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows. In order to control the synthesis, all grains in a given corpus are analyzed through a set of acoustic descriptors. This provides a representation reflecting some form of local similarities across the grains. However, the quality of this grain space is bound by that of the descriptors. Its traversal is not continuously invertible to signal and does not render any structured temporality. We demonstrate that generative neural networks can implement granular synthesis while alleviating most of its shortcomings. We efficiently replace its audio descriptor basis by a probabilistic latent space learned with a Variational Auto-Encoder. In this setting the learned grain space is invertible, meaning that we can continuously synthesize sound when traversing its dimensions. It also implies that original grains are not stored for synthesis. Another major advantage of our approach is to learn structured paths inside this latent space by training a higher-level temporal embedding over arranged grain sequences. The model can be applied to many types of libraries, including pitched notes or unpitched drums and environmental noises. We report experiments on the common granular synthesis processes as well as novel ones such as conditional sampling and morphing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题