社会上公平的K-均值集群

论文标题

社会上公平的K-均值集群

Socially Fair k-Means Clustering

论文作者

Ghadiri, Mehrdad, Samadi, Samira, Vempala, Santosh

论文摘要

我们表明，用于多种科学数据的流行K-均值聚类算法（劳埃德的启发式）可能会导致结果对数据亚组不利（例如，人口统计组）。这种偏见的聚类可能对以人为中心的应用（例如资源分配）具有有害的影响。我们提出了一个公平的K-均值目标和算法，以选择为不同群体提供公平成本的集群中心。该算法是Fair-Lloyd，是劳埃德（Lloyd）对K均值的启发式的修改，继承了其简单，效率和稳定性。与标准劳埃德（Standard Lloyd's）相比，我们发现在基准数据集上，Fair-lloyd通过确保所有组在输出K群集中的成本相等，同时在运行时间的增加而增加，从而表现出无偏的性能，从而在当前使用K-Means的任何地方都可以忽略不计。

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题