贝叶斯模型选择，边际可能性和概括

论文标题

贝叶斯模型选择，边际可能性和概括

Bayesian Model Selection, the Marginal Likelihood, and Generalization

论文作者

Lotfi, Sanae, Izmailov, Pavel, Benton, Gregory, Goldblum, Micah, Wilson, Andrew Gordon

论文摘要

我们如何比较与观察完全一致的假设之间的比较？边际可能性（又名Bayesian证据）代表了从先验中产生我们观察结果的概率，为这个基本问题提供了一种独特的方法，可以自动编码Occam的剃须刀。尽管已经观察到，边际可能性可能会过度合适并且对先前的假设敏感，但其超参数学习和离散模型比较的局限性尚未得到彻底研究。我们首先重新审视了学习限制和假设检验的边际可能性的吸引力。然后，我们强调使用边际可能性作为概括的概念和实际问题。也就是说，我们展示了边际可能性如何与概括呈负相关，对神经架构搜索的影响，并且可以导致超参数学习中的拟合不足和过度拟合。我们还重新检查了边际可能性和Pac-Bayes边界之间的联系，并使用此连接进一步阐明了模型选择的边际可能性的缺点。我们通过有条件的边际可能性提供了部分补救措施，我们表明，这与概括更加一致，并且实际上对于大规模超参数学习，例如在深内核学习中。

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We also re-examine the connection between the marginal likelihood and PAC-Bayes bounds and use this connection to further elucidate the shortcomings of the marginal likelihood for model selection. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题