与内核公平性的遗忘数据

论文标题

与内核公平性的遗忘数据

Oblivious Data for Fairness with Kernels

论文作者

Grünewälder, Steffen, Khaleghi, Azadeh

论文摘要

在有敏感和不敏感的特征的情况下，我们研究了算法公平性的问题，并且旨在产生新的，“忘记”的特征，这些特征紧密近似于非敏感特征，并且仅依赖于敏感的特征。我们在内核方法的背景下研究了这个问题。我们分析了最大平均差异标准的轻松版本，该版本不能保证完全独立性，而是使优化问题可以解决。我们为这个放松的优化问题得出了封闭形式的解决方案，并通过研究新生成的特征和敏感的依赖关系来补充结果。我们生成这种遗忘特征的关键要素是希尔伯特空间值的条件期望，需要从数据中估算。我们提出了一种插件方法，并说明如何控制估计错误。尽管我们的技术有助于减少偏见，但我们想指出的是，任何数据集的后处理都不可能用作精心设计的实验的替代方法。

We investigate the problem of algorithmic fairness in the case where sensitive and non-sensitive features are available and one aims to generate new, `oblivious', features that closely approximate the non-sensitive features, and are only minimally dependent on the sensitive ones. We study this question in the context of kernel methods. We analyze a relaxed version of the Maximum Mean Discrepancy criterion which does not guarantee full independence but makes the optimization problem tractable. We derive a closed-form solution for this relaxed optimization problem and complement the result with a study of the dependencies between the newly generated features and the sensitive ones. Our key ingredient for generating such oblivious features is a Hilbert-space-valued conditional expectation, which needs to be estimated from data. We propose a plug-in approach and demonstrate how the estimation errors can be controlled. While our techniques help reduce the bias, we would like to point out that no post-processing of any dataset could possibly serve as an alternative to well-designed experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题