具有缺失值的基质变量数据的度量界限浓度

论文标题

具有缺失值的基质变量数据的度量界限浓度

Concentration of measure bounds for matrix-variate data with missing values

论文作者

Zhou, Shuheng

论文摘要

我们考虑以下数据扰动模型，其中协变量会产生乘法误差。对于两个$ n \ times m $随机矩阵$ u，x $，我们用$ u \ circ x $ the hadamard或schur产品表示，该产品定义为$（u \ circ x）_ {ij} =（u_ {u_ {ij}）=（u_ {ij}）\ cdot（x__ {ij}）$。在本文中，我们研究了Subgaussian Matrix变量模型，在其中我们通过随机掩码$ u $ $ $ $ $ $ $ $ x $ x $观察到矩阵变量$ x $： $$ {\ Mathcal X} = U \ Circ X \; \; \; \ text {where} \; \; \; x = b^{1/2} {\ mathbb {z}} a^{1/2}，$$ 其中$ {\ mathbb {z}} $是一个随机矩阵，具有独立的subgaussian条目，而$ u $是一个具有零或正条目的掩码矩阵，其中$ {\ mathbb e} u_} u_ {ij} \ in [0，1] $ in [0，1] $，所有输入物是相互独立的。 $ x $的条目中的行，列或随机采样是此模型的特殊情况。在$ U $和$ x $之间独立的假设下，我们引入了与估计协方差$ a $ a $ a $ a和$ b $的零件无偏估计量，并证明，在保证有限的eigenvalue（$ \ \ \ \ \ \ textsf {re} $）条件下，$ b $ b $ b $ by n时，$ b $ n y sames same same s same s same s same s same s same s same s same s same s same s same s same s same s same s same s same s same same。我们进一步开发了多种回归方法，以估计$ b $的倒数并显示收敛的统计率。我们的结果为实体之间的关系（样本，位置，项目）之间的关系稀疏恢复提供了见解，当特征（变量，时间点，用户评分）出现在观察到的数据矩阵$ {\ Mathcal x} $的情况下，并具有异质性。我们的证明技术肯定可以扩展到其他情况。我们提供了启示理论预测的模拟证据。

We consider the following data perturbation model, where the covariates incur multiplicative errors. For two $n \times m$ random matrices $U, X$, we denote by $U \circ X$ the Hadamard or Schur product, which is defined as $(U \circ X)_{ij} = (U_{ij}) \cdot (X_{ij})$. In this paper, we study the subgaussian matrix variate model, where we observe the matrix variate data $X$ through a random mask $U$: $$ {\mathcal X} = U \circ X \; \; \; \text{ where} \; \; \;X = B^{1/2} {\mathbb{Z}} A^{1/2}, $$ where ${\mathbb{Z}}$ is a random matrix with independent subgaussian entries, and $U$ is a mask matrix with either zero or positive entries, where ${\mathbb E} U_{ij} \in [0, 1]$ and all entries are mutually independent. Subsampling in rows, or columns, or random sampling of entries of $X$ are special cases of this model. Under the assumption of independence between $U$ and $X$, we introduce componentwise unbiased estimators for estimating covariance $A$ and $B$, and prove the concentration of measure bounds in the sense of guaranteeing the restricted eigenvalue($\textsf{RE}$) conditions to hold on the unbiased estimator for $B$, when columns of data matrix $X$ are sampled with different rates. We further develop multiple regression methods for estimating the inverse of $B$ and show statistical rate of convergence. Our results provide insight for sparse recovery for relationships among entities (samples, locations, items) when features (variables, time points, user ratings) are present in the observed data matrix ${\mathcal X}$ with heterogeneous rates. Our proof techniques can certainly be extended to other scenarios. We provide simulation evidence illuminating the theoretical predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题