论文标题

多功能仪表音乐合成与光谱扩散

Multi-instrument Music Synthesis with Spectrogram Diffusion

论文作者

Hawthorne, Curtis, Simon, Ian, Roberts, Adam, Zeghidour, Neil, Gardner, Josh, Manilow, Ethan, Engel, Jesse

论文摘要

理想的音乐合成器应具有互动性和表现力,并实时产生高保真音频,以进行任意组合仪器和音符的组合。最近的神经合成器在特定于域的模型之间表现出了折衷,这些模型只能对特定仪器进行详细控制,或者可以在任何音乐上训练但最少的控制和缓慢的生成。在这项工作中,我们专注于神经合成器的中间立场,这些基础可以从MIDI序列中产生音频,并实时使用任意仪器组合。这使得具有单个模型的各种转录数据集的培训,从而在各种仪器上提供了对组成和仪器的注释级别的控制。我们使用一个简单的两阶段过程:MIDI到具有编码器换件器的频谱图,然后使用生成的对抗网络(GAN)频谱图逆变器将频谱图到音频。我们将训练解码器作为自回归模型进行了比较,并将其作为deno的扩散概率模型(DDPM),发现DDPM方法在定性上是优越的,并且通过音频重建和FRéchet距离指标来衡量。鉴于这种方法的互动性和普遍性,我们发现这是迈向互动和表达性神经综合的有希望的第一步,以实现工具和音符的任意组合。

An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fréchet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源