论文标题
高维度的规范相关分析的显着性测试
Significance testing for canonical correlation analysis in high dimensions
论文作者
论文摘要
我们考虑了基于选择后的推理方法来测试大型随机变量之间存在线性关系的问题,以进行规范相关分析。挑战是调整具有最大样品相关线性组合的变量的子集的选择。为此,我们构建了一个稳定的一步估计量,该估计值是在预先指定的基数变量子集上最大化的规范相关性的欧几里得 - 正类别。如果变量的尺寸不会随着样本量而增长太快,则该估计量的目标参数是一致的,并且渐近正常。我们还开发了一种贪婪的搜索算法来准确计算估计器,从而为全局零假设进行了可计算的综合测试,即任何具有预先指定基数性的变量子集之间都没有线性关系。此外,我们为目标参数开发了一个考虑变量选择的置信区间。
We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. Further, we develop a confidence interval for the target parameter that takes the variable selection into account.