论文标题

闸门:无模型可变重要性的推断

Floodgate: inference for model-free variable importance

论文作者

Zhang, Lu, Janson, Lucas

论文摘要

许多现代应用程序旨在了解结果变量$ y $与协变量$ x $之间的关系,在(可能高维)混淆变量$ z $之间。尽管已经对测试\ emph {} $ y $的关注非常关注,这是否取决于$ x $给定的$ z $,但在本文中,我们试图通过推断该依赖性的\ emph {strength}来超越测试。我们首先定义了我们的估计数,即最小平方误差(MMSE)差距,该差距以确定性,无模型,可解释和对非线性和交互敏感的方式量化$ y $和$ x $之间的条件关系。然后,我们提出了一种称为\ emph {Greggate}的新推论方法,该方法可以利用用户选择的任何工作回归函数(例如,它是由最先进的机器学习算法拟合或从定性域知识中衍生而成的),以构建构造置信度,我们将其应用于mmse gap。 \ acc {我们还表明,闸门的准确性(从置信到限制到估算的距离)适应了工作回归函数的误差。}然后,当$ y $是二进制时,我们可以将相同的闸门原理应用于不同的可变重要性量度。最后,我们在一系列模拟中证明了闸门的表现,并将其应用于英国生物库的数据,以推断血小板对各种基因突变群体的依赖性的强度。

Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing \emph{whether} $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the \emph{strength} of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called \emph{floodgate} that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. \acc{We additionally show that floodgate's accuracy (distance from confidence bound to estimand) is adaptive to the error of the working regression function.} We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源