论文标题

非结构化数据中的亚组发现

Subgroup Discovery in Unstructured Data

论文作者

Arab, Ali, Arora, Dev, Lu, Jialin, Ester, Martin

论文摘要

亚组发现是一种描述性和探索性数据挖掘技术,可识别人群中有关感兴趣变量表现出有趣行为的亚组。亚组发现在知识发现和假设生成中有许多应用,但是对于非结构化的高维数据(例如图像)仍然不适用。这是因为亚组发现算法依赖于基于(属性,值)对定义描述性规则,但是,在非结构化数据中,属性并不是很好的定义。即使数据中属性的概念在数据中存在,例如图像中的像素,由于数据的高维度,这些属性的信息不足以在规则中使用。在本文中,我们介绍了亚组感知的变分自动编码器,这是一种新型的变分自动编码器,它学习了非结构化数据的表示,从而导致具有较高质量的亚组。我们的实验结果证明了该方法在以高质量学习亚组的同时支持概念的解释性的有效性。

Subgroup discovery is a descriptive and exploratory data mining technique to identify subgroups in a population that exhibit interesting behavior with respect to a variable of interest. Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation, yet it remains inapplicable for unstructured, high-dimensional data such as images. This is because subgroup discovery algorithms rely on defining descriptive rules based on (attribute, value) pairs, however, in unstructured data, an attribute is not well defined. Even in cases where the notion of attribute intuitively exists in the data, such as a pixel in an image, due to the high dimensionality of the data, these attributes are not informative enough to be used in a rule. In this paper, we introduce the subgroup-aware variational autoencoder, a novel variational autoencoder that learns a representation of unstructured data which leads to subgroups with higher quality. Our experimental results demonstrate the effectiveness of the method at learning subgroups with high quality while supporting the interpretability of the concepts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源