论文标题
Sigle:使用广义线性拉索选择性推断的有效过程
SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso
论文作者
论文摘要
本文研究了广义线性拉索(GLL)的不确定性量化,这是高维回归设置中一种流行的变量选择方法。在许多研究领域,研究人员使用数据驱动的方法选择了最有可能与响应变量相关的变量的子集。但是,这种可变选择方法可以引入偏见并增加假阳性的可能性,从而得出错误的结论。在本文中,我们提出了一个选择后推理框架,该框架解决了这些问题,并允许使用GLL选择变量选择后有效的统计推理。我们表明,我们的方法提供准确的$ p $值和置信区间,同时保持高统计能力。在第二阶段,我们专注于稀疏的逻辑回归,这是高维统计中流行的分类器。我们通过广泛的数值模拟显示,Sigle比最新的PSI方法更强大。 Sigle依靠一种新方法来从选择事件条件的观测值分布中采样状态。该方法基于模拟退火策略,该策略由Logistic Lasso的一阶条件给出。
This article investigates uncertainty quantification of the generalized linear lasso~(GLL), a popular variable selection method in high-dimensional regression settings. In many fields of study, researchers use data-driven methods to select a subset of variables that are most likely to be associated with a response variable. However, such variable selection methods can introduce bias and increase the likelihood of false positives, leading to incorrect conclusions. In this paper, we propose a post-selection inference framework that addresses these issues and allows for valid statistical inference after variable selection using GLL. We show that our method provides accurate $p$-values and confidence intervals, while maintaining high statistical power. In a second stage, we focus on the sparse logistic regression, a popular classifier in high-dimensional statistics. We show with extensive numerical simulations that SIGLE is more powerful than state-of-the-art PSI methods. SIGLE relies on a new method to sample states from the distribution of observations conditional on the selection event. This method is based on a simulated annealing strategy whose energy is given by the first order conditions of the logistic lasso.