论文标题
预期神经网络用于复杂疾病的遗传数据分析
Expectile Neural Networks for Genetic Data Analysis of Complex Diseases
论文作者
论文摘要
常见疾病的遗传病因是高度复杂和异质性的。经典的统计方法(例如线性回归)已成功鉴定出许多与复杂疾病相关的遗传变异。尽管如此,对于大多数复杂疾病而言,确定的变体仅占遗传力的一小部分。仍然存在挑战,以发现导致复杂疾病的其他变体。预期回归是线性回归的概括,并提供了有关感兴趣表型的条件分布的完整信息。尽管预期回归具有许多不错的特性,并且对遗传数据分析具有很大的希望(例如,研究遗传变异型易受易于高危人群的遗传变异),但它很少在遗传研究中使用。在本文中,我们开发了一种预期的神经网络(ENN)方法,用于复杂疾病的遗传数据分析。与预期回归类似,ENN对遗传变异和疾病表型之间的关系提供了全面的看法,可用于发现易于亚群(例如,高风险基团)的遗传变异。我们进一步将神经网络的概念整合到ENN中,从而能够捕获非线性和非加性遗传效应(例如基因 - 基因相互作用)。通过模拟,我们表明,当遗传变异和疾病表型之间存在复杂的关系时,所提出的方法比现有的预期回归优于现有的预期回归。我们还将提出的方法应用于成瘾研究的遗传数据:遗传学和环境(SAGE),研究了候选基因与吸烟量的关系。
The genetic etiologies of common diseases are highly complex and heterogeneous. Classic statistical methods, such as linear regression, have successfully identified numerous genetic variants associated with complex diseases. Nonetheless, for most complex diseases, the identified variants only account for a small proportion of heritability. Challenges remain to discover additional variants contributing to complex diseases. Expectile regression is a generalization of linear regression and provides completed information on the conditional distribution of a phenotype of interest. While expectile regression has many nice properties and holds great promise for genetic data analyses (e.g., investigating genetic variants predisposing to a high-risk population), it has been rarely used in genetic research. In this paper, we develop an expectile neural network (ENN) method for genetic data analyses of complex diseases. Similar to expectile regression, ENN provides a comprehensive view of relationships between genetic variants and disease phenotypes and can be used to discover genetic variants predisposing to sub-populations (e.g., high-risk groups). We further integrate the idea of neural networks into ENN, making it capable of capturing non-linear and non-additive genetic effects (e.g., gene-gene interactions). Through simulations, we showed that the proposed method outperformed an existing expectile regression when there exist complex relationships between genetic variants and disease phenotypes. We also applied the proposed method to the genetic data from the Study of Addiction: Genetics and Environment(SAGE), investigating the relationships of candidate genes with smoking quantity.