论文标题

下一代测序数据的稀有遗传变异分析的贝叶斯因子方法具有信息性的先验

A Bayes Factor Approach with Informative Prior for Rare Genetic Variant Analysis from Next Generation Sequencing Data

论文作者

Xu, Jingxiong, Xu, Wei, Briollais, Laurent

论文摘要

通过下一代测序发现罕见的遗传变异是人类遗传学领域的一个非常具有挑战性的问题。我们提出了一种基于贝叶斯因子(BF)的新型基于区域的统计方法,以评估位于同一基因组区域上的一组罕见变体(RV)与病例对照设计中疾病结果之间关联的证据。在零和替代假设下计算边际可能性,假设该区域的RV计数具有二项式分布,以及DIRAC和Beta的Beta或Beta先验分布的Beta或混合物的RV概率。我们在先前的环境下得出了BF的理论无效分布,并表明可以为全基因组推断获得对错误发现率(BFDR)的贝叶斯控制。使用Kolmogorov-Smirnov检验统计量的先前证据来引入信息先验。我们使用仿真程序SIM1000G生成类似于1,000个基因组测序项目的RV数据。我们的模拟研究表明,新的BF统计量优于标准方法(SKAT,SKAT-O,负担测试),在具有中等样本量的病例对照研究中,在大型样本尺寸的情况下等同于它们。我们在肺癌病例对照研究中的真实数据应用于已知和新型癌症基因中RV的富集。这也表明,与BF相比,使用BF进行信息丰富的先验,可以改善总体基因的发现。

The discovery of rare genetic variants through Next Generation Sequencing is a very challenging issue in the field of human genetics. We propose a novel region-based statistical approach based on a Bayes Factor (BF) to assess evidence of association between a set of rare variants (RVs) located on the same genomic region and a disease outcome in the context of case-control design. Marginal likelihoods are computed under the null and alternative hypotheses assuming a binomial distribution for the RV count in the region and a beta or mixture of Dirac and beta prior distribution for the probability of RV. We derive the theoretical null distribution of the BF under our prior setting and show that a Bayesian control of the False Discovery Rate (BFDR) can be obtained for genome-wide inference. Informative priors are introduced using prior evidence of association from a Kolmogorov-Smirnov test statistic. We use our simulation program, sim1000G, to generate RV data similar to the 1,000 genomes sequencing project. Our simulation studies showed that the new BF statistic outperforms standard methods (SKAT, SKAT-O, Burden test) in case-control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Our real data application to a lung cancer case-control study found enrichment for RVs in known and novel cancer genes. It also suggests that using the BF with informative prior improves the overall gene discovery compared to the BF with non-informative prior.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源