论文标题
基于小波的多元信号使用Mahalanobis距离和EDF统计
Wavelet based multivariate signal denoising using Mahalanobis distance and EDF statistics
论文作者
论文摘要
提出了一种多变量信号denoising方法,该方法采用了一种新型的拟合多元良好(GOF)测试,该测试以从离散小波转换(DWT)获得的多个数据量表应用。在提出的多元GOF测试中,我们首先利用平方的Mahalanobis距离(MD)度量来转换位于m维空间中的输入多变量数据$ \ Mathcal {r}^m $ to tositional {r} m $ to to pastion nubmumber $ \ m} \ Mathcal {r} _+$,其中$ m> 1 $。由于MD度量的属性,$ \ Mathcal {r} _+$中的转换数据遵循独特的分布。这使我们能够使用基于经验分布函数(EDF)的统计数据应用GOF测试,以定义多元正态性的测试。我们进一步建议将上述测试本地应用于从离散小波变换获得的多个输入数据量表上,从而产生了多元信号denoising框架。在提出的方法中,参考累积分布函数(CDF)定义为多元高斯随机过程的二次变换。因此,提出的方法检查一组DWT系数是否属于多元参考分布。属于参考分布的系数被丢弃。通过对合成和现实世界数据集进行广泛的模拟来证明我们提出的方法的有效性。
A multivariate signal denoising method is proposed which employs a novel multivariate goodness of fit (GoF) test that is applied at multiple data scales obtained from discrete wavelet transform (DWT). In the proposed multivariate GoF test, we first utilize squared Mahalanobis distance (MD) measure to transform input multivariate data residing in M-dimensional space $\mathcal{R}^M$ to a single-dimensional space of positive real numbers $\mathcal{R}_+$, i.e., $\mathcal{R}^M \rightarrow \mathcal{R}_+$, where $M > 1$. Owing to the properties of the MD measure, the transformed data in $\mathcal{R}_+$ follows a distinct distribution. That enables us to apply the GoF test using statistic based on empirical distribution function (EDF) on the resulting data in order to define a test for multivariate normality. We further propose to apply the above test locally on multiple input data scales obtained from discrete wavelet transform, resulting in a multivariate signal denoising framework. Within the proposed method, the reference cumulative distribution function (CDF) is defined as a quadratic transformation of multivariate Gaussian random process. Consequently, the proposed method checks whether a set of DWT coefficients belong to multivariate reference distribution or not; the coefficients belonging to the reference distribution are discarded. The effectiveness of our proposed method is demonstrated by performing extensive simulations on both synthetic and real world datasets.