论文标题
基于与随机森林的协变量的条件规范相关估计
Conditional canonical correlation estimation based on covariates with random forests
论文作者
论文摘要
研究两组变量之间的关系有助于理解它们的相互作用,并且可以通过规范相关分析(CCA)来完成。但是,两组之间的相关性有时可能取决于第三组协变量,通常是与受试者相关的协变量,例如年龄,性别或其他临床指标。在这种情况下,将CCA应用于整个人群并不是最佳的,鉴于协变量,估计有条件的CCA的方法是有用的。我们提出了一种带有规范相关分析(RFCCA)的新方法,称为随机森林(RFCCA),以估算给定受试者相关的协变量的两组变量之间的条件规范相关性。森林中的各个树木的建造是专门设计的,该规则专门设计用于分区数据,以最大化儿童节点之间的规范相关异质性。我们还提出了一个显着性测试,以检测协变量对两组变量之间关系的全局效应。提出的方法的性能和全球显着性测试通过模拟研究评估,该研究表明它提供了准确的规范相关估计和良好控制的类型1误差。我们还显示了所提出的方法与EEG数据的应用。
Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender, or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data.