论文标题
通过广义级别估计的稳健估计
Robust estimation via generalized quasi-gradients
论文作者
论文摘要
我们探讨了为什么许多最近提出的强大估计问题是有效解决的,即使潜在的优化问题是非凸面。我们研究了这些可靠的估计问题的损失格局,并确定了“广义准级别”的存在。每当存在这些准级别时,一大批低温算法就可以确保近似全球最低限度。这包括常用的过滤算法。 对于有界协方差下的分布的强大平均值估计,我们表明,当且仅当损坏级别$ε<1/3 $时,相关优化问题的任何一阶固定点都是{近似全局最小值}。因此,任何侵蚀固定点的优化算法都会产生有效的稳健估计器,并以崩溃点$ 1/3 $。通过仔细的初始化和步长,我们将其提高到$ 1/2 $,这是最佳的。 对于其他任务,包括线性回归和联合平均值和协方差估计,损失格局更加坚固:远离全球最小值的固定点是固定点。然而,我们表明存在广义的准元素并构建有效的算法。这些算法比文献中的算法更简单,对于线性回归,我们将估计误差从$ o(\sqrtε)$提高到假设经认证的超额合同的$ O(ε)$的最佳速率。对于具有近乎身份协方差的平均估计,我们表明一种简单的梯度下降算法可实现崩溃点$ 1/3 $和迭代复杂性$ \ tilde {o}(o}(d/ε^2)$。
We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly-used filtering algorithm. For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an {approximate global minimum} if and only if the corruption level $ε< 1/3$. Consequently, any optimization algorithm that aproaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With careful initialization and step size, we improve this to $1/2$, which is optimal. For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $O(\sqrtε)$ to the optimal rate of $O(ε)$ for small $ε$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $\tilde{O}(d/ε^2)$.