p值是通过置信分布来衡量的证据的强度

论文标题

p值是通过置信分布来衡量的证据的强度

p-Value as the Strength of Evidence Measured by Confidence Distribution

论文作者

Liu, Sifan, Liu, Regina, Xie, Min-ge

论文摘要

p值的概念是统计推断中的一个基本概念，已被广泛用于报告假设检验的结果。但是，在实践中，P值通常被误解，滥用或误解。问题的一部分是，p值的现有定义通常来自特定设置下的构造，而直接反映零假设的证据的一般定义尚不可用。在本文中，我们首先提出了P值的一般和严格的定义，该定义实现了两个基于绩效的特征。基于绩效的定义涵盖了p值的所有现有基于施工的定义，并证明了它们的解释。本文进一步提出了一种基于置信分布的特定方法，以制定和计算P值。这种计算P值的特定方式具有两个主要优点。首先，它适用于广泛的假设测试问题，包括标准的一侧测试和双面测试，间隔型无效的测试，相交联合测试，多变量测试等。其次，它自然可以导致对p值作为支持零假设的证据的连贯解释，并有意义地衡量了这种支持程度。特别是，它具有大型P值的含义，例如p值为0.8的支持超过0.5。数值示例用于说明我们方法的广泛适用性和计算可行性。我们表明我们的建议是有效的，可以广泛应用，而无需进一步考虑空空间的形式/大小。至于现有的测试方法，该解决方案尚未可用或无法轻易获得。

The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue is that existing definitions of p-value are often derived from constructions under specific settings, and a general definition that directly reflects the evidence of the null hypothesis is not yet available. In this article, we first propose a general and rigorous definition of p-value that fulfills two performance-based characteristics. The performance-based definition subsumes all existing construction-based definitions of the p-value, and justifies their interpretations. The paper further presents a specific approach based on confidence distribution to formulate and calculate p-values. This specific way of computing p values has two main advantages. First, it is applicable for a wide range of hypothesis testing problems, including the standard one- and two-sided tests, tests with interval-type null, intersection-union tests, multivariate tests and so on. Second, it can naturally lead to a coherent interpretation of p-value as evidence in support of the null hypothesis, as well as a meaningful measure of degree of such support. In particular, it places a meaning of a large p-value, e.g. p-value of 0.8 has more support than 0.5. Numerical examples are used to illustrate the wide applicability and computational feasibility of our approach. We show that our proposal is effective and can be applied broadly, without further consideration of the form/size of the null space. As for existing testing methods, the solutions have not been available or cannot be easily obtained.

下载PDF全文

下载文献需遵守相关版权规定

论文标题