闸门：无模型可变重要性的推断

论文标题

闸门：无模型可变重要性的推断

Floodgate: inference for model-free variable importance

论文作者

Zhang, Lu, Janson, Lucas

论文摘要

许多现代应用程序旨在了解结果变量$ y $与协变量$ x $之间的关系，在（可能高维）混淆变量$ z $之间。尽管已经对测试\ emph {} $ y $的关注非常关注，这是否取决于$ x $给定的$ z $，但在本文中，我们试图通过推断该依赖性的\ emph {strength}来超越测试。我们首先定义了我们的估计数，即最小平方误差（MMSE）差距，该差距以确定性，无模型，可解释和对非线性和交互敏感的方式量化$ y $和$ x $之间的条件关系。然后，我们提出了一种称为\ emph {Greggate}的新推论方法，该方法可以利用用户选择的任何工作回归函数（例如，它是由最先进的机器学习算法拟合或从定性域知识中衍生而成的），以构建构造置信度，我们将其应用于mmse gap。 \ acc {我们还表明，闸门的准确性（从置信到限制到估算的距离）适应了工作回归函数的误差。}然后，当$ y $是二进制时，我们可以将相同的闸门原理应用于不同的可变重要性量度。最后，我们在一系列模拟中证明了闸门的表现，并将其应用于英国生物库的数据，以推断血小板对各种基因突变群体的依赖性的强度。

Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing \emph{whether} $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the \emph{strength} of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called \emph{floodgate} that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. \acc{We additionally show that floodgate's accuracy (distance from confidence bound to estimand) is adaptive to the error of the working regression function.} We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题