论文标题
计算上有效且统计上最佳的鲁棒低级矩阵和张量估计
Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation
论文作者
论文摘要
在重尾噪声下,低率矩阵估计在计算和统计上都是具有挑战性的。凸方法在统计学上已被证明是最佳的,但遭受了高计算成本,尤其是因为强大的损失功能通常不平滑。最近,提出了通过次级下降的计算快速非凸方法方法,不幸的是,即使在高斯亚噪声下,也无法提供统计上一致的估计量。在本文中,我们介绍了一种新型的Riemannian次级(RSGRAD)算法,该算法不仅具有线性收敛在计算上有效,而且在统计学上是最佳的,是噪声高斯或重尾。研究了针对一般框架的收敛理论,并研究了对绝对损失,Huber损失和分位数损失的特定应用。与现有的非凸方法相比,我们的双相收敛揭示了令人惊讶的现象。在第一阶段,RSGRAD的行为就像在典型的非平滑优化中一样,需要逐渐衰减的步骤。但是,第一阶段仅提供统计上最佳的估计量,该估计量已经在现有文献中观察到。有趣的是,在第二阶段,RSGRAD线性收敛,好像最小化光滑且强烈的凸客观函数,从而使恒定的步骤足够。基础两回合收敛是随机噪声对非平滑稳定损失的平滑效果,但在接近事实的区域内不太接近。最后,RSGRAD适用于在重尾噪声下的低级张量估计,在统计上最佳的速率可以实现,并且具有相同的双相收敛现象,并且可以保证一种新型的基于收缩的二阶方法可以提供温暖的初始化。数值模拟证实了我们的理论发现,并展示了RSGrad优于先前方法的优越性。
Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.