论文标题

使用$ \ mathbb {l} _ {1} $ - 损失的稳健密度估计。应用于满足形状约束的线上密度的估计

Robust density estimation with the $\mathbb{L}_{1}$-loss. Applications to the estimation of a density on the line satisfying a shape constraint

论文作者

Baraud, Y., Halconruy, H., Maillard, G.

论文摘要

我们解决了估计假定的分布的问题。观察总变异损失。我们的方法是基于密度模型的,并且用途广泛以应对许多不同的模型,包括某些密度模型,最大似然估计器(简称MLE)不存在。我们主要说明估计器对满足形状约束的线上密度模型的属性。我们表明,就某些全球收敛速率而言,它具有某些类似的最优性能,就像MLE存在时一样。它还相对于模型中的某些特定目标密度享有一些适应性,该模型被证明可以以参数速率收敛。更重要的是,我们的估计器不仅在模型错误指定方面,而且还具有污染,数据集中的异常值和等分分配假设的存在。这意味着估计器的性能几乎与数据相同。在这些数据仅是独立的情况下,密度$ p $的,它们的大多数边际都足够接近与密度$ p $的分布的总变化。我们还表明,即使该密度属于模型,我们的估计器将收敛到数据的平均密度,即使边际密度不属于该模型。我们对估算器风险的主要结果采用了指数偏差不平等的形式,该偏差不平等,该偏差不平等,涉及显式数值常数。我们从中得出几种全局收敛速率,包括minimax $ \ mathbb {l} _ {1} $的一些界限 - 在凹入和log-conconcave密度的集合上风险。这些边界从单调,凸,凹形和对数孔的密度的近似值中得出了一些特定的结果。这样的结果可能具有独立的兴趣。

We solve the problem of estimating the distribution of presumed i.i.d. observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many different ones, including some density models for which the Maximum Likelihood Estimator (MLE for short) does not exist. We mainly illustrate the properties of our estimator on models of densities on the line that satisfy a shape constraint. We show that it possesses some similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. It also enjoys some adaptation properties with respect to some specific target densities in the model for which our estimator is proven to converge at parametric rate. More important is the fact that our estimator is robust, not only with respect to model misspecification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. This means that the estimator performs almost as well as if the data were i.i.d. with density $p$ in a situation where these data are only independent and most of their marginals are close enough in total variation to a distribution with density $p$. We also show that our estimator converges to the average density of the data, when this density belongs to the model, even when none of the marginal densities belongs to it. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is non-asymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax $\mathbb{L}_{1}$-risks over the sets of concave and log-concave densities. These bounds derive from some specific results on the approximation of densities which are monotone, convex, concave and log-concave. Such results may be of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源