论文标题

发现频繁的渐进项目集具有不精确的数据

Discovering Frequent Gradual Itemsets with Imprecise Data

论文作者

Boujike, Michaël Chirmeni, Lonlac, Jerry, Tsopze, Norbert, Nguifo, Engelbert Mephu

论文摘要

模拟“越多/更少x,越少/y”的属性复杂归因的逐渐模式在许多现实世界应用中起着至关重要的作用,在许多现实世界应用中,要管理的数值数据数量很重要,这就是生物学数据。最近,这些类型的模式引起了数据挖掘社区的关注,在这些方法中,已经定义了几种方法来自动从不同的数据模型中提取和管理这些模式。但是,这些方法通常面临管理挖掘模式数量的问题,并且在许多实际应用中,所有这些模式的计算可能被证明对用户定义的频率阈值非常有用,并且缺乏焦点导致产生大量模式的集合。此外,传统方法的另一个问题是,渐进性的概念被定义为增加或减少。实际上,一旦两个对象上的属性值不同,就会立即考虑逐渐度。结果,尽管它们的渐进性只是数据中的噪声效应,但可以将传统算法提取的大量模式显示给用户。为了解决这个问题,本文建议介绍考虑增加或减少的渐进性阈值。与文献方法相反,所提出的方法考虑了属性值的分布以及用户对渐进性阈值的偏好,并且可以在由于太大的搜索空间导致文献方法失败的某些数据库上提取渐进模式。此外,来自实际数据库的实验评估的结果表明,所提出的算法是可扩展的,有效的,并且可以消除许多未验证特定渐进性要求以向用户显示一系列模式的模式。

The gradual patterns that model the complex co-variations of attributes of the form "The more/less X, The more/less Y" play a crucial role in many real world applications where the amount of numerical data to manage is important, this is the biological data. Recently, these types of patterns have caught the attention of the data mining community, where several methods have been defined to automatically extract and manage these patterns from different data models. However, these methods are often faced the problem of managing the quantity of mined patterns, and in many practical applications, the calculation of all these patterns can prove to be intractable for the user-defined frequency threshold and the lack of focus leads to generating huge collections of patterns. Moreover another problem with the traditional approaches is that the concept of gradualness is defined just as an increase or a decrease. Indeed, a gradualness is considered as soon as the values of the attribute on both objects are different. As a result, numerous quantities of patterns extracted by traditional algorithms can be presented to the user although their gradualness is only a noise effect in the data. To address this issue, this paper suggests to introduce the gradualness thresholds from which to consider an increase or a decrease. In contrast to literature approaches, the proposed approach takes into account the distribution of attribute values, as well as the user's preferences on the gradualness threshold and makes it possible to extract gradual patterns on certain databases where literature approaches fail due to too large search space. Moreover, results from an experimental evaluation on real databases show that the proposed algorithm is scalable, efficient, and can eliminate numerous patterns that do not verify specific gradualness requirements to show a small set of patterns to the user.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源