论文标题
vae-krnet及其在变异贝叶斯的应用
VAE-KRnet and its applications to variational Bayes
论文作者
论文摘要
在这项工作中,我们提出了一个称为VAE-KRNET的生成模型,用于密度估计或近似值,该模型将规范变化自动编码器(VAE)与我们最近开发的基于流量的生成模型(称为KRNET)结合在一起。 VAE用作捕获潜在空间的尺寸缩小技术,而KRNET用于对潜在变量的分布进行建模。使用数据和潜在变量之间的线性模型,我们表明VAE-KRNET比规范VAE更有效,更健壮。 VAE-KRNET可以用作密度模型,以近似已知的数据分布或任意概率密度函数(PDF)到常数。 Vae-Krnet在维度方面是灵活的。当尺寸的数量相对较小时,KRNET可以根据原始随机变量有效地近似分布。对于高维情况,我们可以使用VAE-KRNET纳入尺寸降低。 VAE-KRNET的一种重要应用是后验分布近似的变异贝叶斯。变分的贝叶斯方法通常基于模型和后部之间的kullback-leibler(KL)差异的最小化。对于高维分布,由于维度的诅咒,构建准确的密度模型非常具有挑战性,在这种命中率通常会引入额外的假设以提高效率。例如,经典的平均场方法假定维度之间的相互独立性,这通常会因过度简化而导致的差异被低估。为了减轻此问题,我们将潜在随机变量与原始随机变量之间的相互信息最大化纳入损失,这有助于将更多信息从低密度区域中保留,从而改善了方差的估计。
In this work, we have proposed a generative model, called VAE-KRnet, for density estimation or approximation, which combines the canonical variational autoencoder (VAE) with our recently developed flow-based generative model, called KRnet. VAE is used as a dimension reduction technique to capture the latent space, and KRnet is used to model the distribution of the latent variable. Using a linear model between the data and the latent variable, we show that VAE-KRnet can be more effective and robust than the canonical VAE. VAE-KRnet can be used as a density model to approximate either data distribution or an arbitrary probability density function (PDF) known up to a constant. VAE-KRnet is flexible in terms of dimensionality. When the number of dimensions is relatively small, KRnet can effectively approximate the distribution in terms of the original random variable. For high-dimensional cases, we may use VAE-KRnet to incorporate dimension reduction. One important application of VAE-KRnet is the variational Bayes for the approximation of the posterior distribution. The variational Bayes approaches are usually based on the minimization of the Kullback-Leibler (KL) divergence between the model and the posterior. For high-dimensional distributions, it is very challenging to construct an accurate density model due to the curse of dimensionality, where extra assumptions are often introduced for efficiency. For instance, the classical mean-field approach assumes mutual independence between dimensions, which often yields an underestimated variance due to oversimplification. To alleviate this issue, we include into the loss the maximization of the mutual information between the latent random variable and the original random variable, which helps keep more information from the region of low density such that the estimation of variance is improved.