论文标题
与某些损失功能相关的测试和估计策略
Tests and estimation strategies associated to some loss functions
论文作者
论文摘要
我们考虑估计$ n $独立随机变量的联合分布的问题。我们的方法是基于一个候选概率的家族,我们将称为模型,并选择包含数据的真实分布,或者至少在某些损失函数方面提供了良好的近似值。本文的目的是描述一种一般估计策略,该策略允许根据设计具有良好估计属性的估计器来适应模型的特定特征和选择损耗函数的选择。我们想到的损失是基于总变化,Hellinger,Wasserstein和$ \ Mathbb {l} _p $ - 持续数量。我们表明,所得估计量相对于损失函数的风险可以由近似项的总和来考虑到真实分布与模型之间的损失以及与该分布确实属于模型的界限相对应的复杂性项。我们的结果在对数据的真实分布的温和假设下保持不变,并且基于非肿瘤的指数偏差不平等,并涉及显式常数。当该模型降低到两个不同的概率时,我们显示了我们的估计策略如何导致强大的测试,其第一和第二种错误仅取决于真实分布和两个测试概率之间的损失。
We consider the problem of estimating the joint distribution of $n$ independent random variables. Our approach is based on a family of candidate probabilities that we shall call a model and which is chosen to either contain the true distribution of the data or at least to provide a good approximation of it with respect to some loss function. The aim of the present paper is to describe a general estimation strategy that allows to adapt to both the specific features of the model and the choice of the loss function in view of designing an estimator with good estimation properties. The losses we have in mind are based on the total variation, Hellinger, Wasserstein and $\mathbb{L}_p$-distances to name a few. We show that the risk of the resulting estimator with respect to the loss function can be bounded by the sum of an approximation term accounting for the loss between the true distribution and the model and a complexity term that corresponds to the bound we would get if this distribution did belong to the model. Our results hold under mild assumptions on the true distribution of the data and are based on exponential deviation inequalities that are non-asymptotic and involve explicit constants. When the model reduces to two distinct probabilities, we show how our estimation strategy leads to a robust test whose errors of first and second kinds only depend on the losses between the true distribution and the two tested probabilities.