数据聚集可能导致贝叶斯线性混合模型和贝叶斯方差分析中的偏见：一项模拟研究

论文标题

数据聚集可能导致贝叶斯线性混合模型和贝叶斯方差分析中的偏见：一项模拟研究

Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian ANOVA: A simulation study

论文作者

Schad, Daniel J., Nicenboim, Bruno, Vasishth, Shravan

论文摘要

贝叶斯线性混合效应模型和贝叶斯方差分析越来越多地用于认知科学中，以执行零假设检验，其中零假设是将效应与替代假设进行比较，即效应的效果存在于存在效果，并且与零。虽然容易访问用于贝叶斯因子的软件工具零假设测试，但如何正确指定数据，并且通常尚不清楚模型。在贝叶斯方法中，许多作者在副主题级别使用数据聚合，并在汇总数据上估算贝叶斯因素。在这里，我们使用基于仿真的校准进行模型推理应用于几种示例实验设计，以证明，正如经常出现的分析一样，在贝尼斯分析中，这种综合数据上的零假设检验可能是有问题的。具体而言，当随机斜率差异有所不同（即违反球形假设）时，贝叶斯因子太保守了，对于差异很小的对比度，并且对于差异很大的对比度太大。在汇总数据上运行贝叶斯方差分析可以 - 如果违反了球形假设，同样会导致有偏见的贝叶斯因子结果。此外，当存在随机的项目斜率差异时，在分析中忽略了随机的项目斜率方差时，副主题汇总数据的贝叶斯因素是偏见的（过于宽松的）。可以通过在非聚集数据（例如单个试验）上运行贝叶斯线性混合效应模型来规避或减少这些问题，并明确对完整的随机效应结构进行建模。可再现代码可从\ url {https://osf.io/mjf47/}获得。

Bayesian linear mixed-effects models and Bayesian ANOVA are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can - if the sphericity assumption is violated - likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian linear mixed-effects models on non-aggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from \url{https://osf.io/mjf47/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题