论文标题
赃物:稀疏学习的包装法
SWAG: A Wrapper Method for Sparse Learning
论文作者
论文摘要
大多数机器学习方法和算法对预测性能给予了很高的优先级,这可能并不总是与用户的优先级相对应。在许多情况下,从工程到遗传学的不同领域的从业人员和研究人员都需要结果的可解释性和可复制性,尤其是在设置中,例如,并非所有属性都可以使用。结果,有必要使机器学习算法的输出更加可解释,并提供“等效”学习者的库(就预测绩效而言),用户可以根据属性可用性进行选择,以测试和/或将这些学习者用于预测/诊断目的。为了满足这些需求,我们建议研究一种结合筛选和包装器方法的过程,这些方法基于用户指定的学习方法,贪婪地探索了属性空间,以找到稀疏学习者的库,并随之而来的数据收集和存储成本低。这种新方法(i)提供了可以轻松解释的低维属性网络,并且(ii)基于属性组合的多样性来提高结果的潜在可复制性,从而定义了具有等效预测能力的强大学习者。我们称此算法为“稀疏包装算法”(赃物)。
The majority of machine learning methods and algorithms give high priority to prediction performance which may not always correspond to the priority of the users. In many cases, practitioners and researchers in different fields, going from engineering to genetics, require interpretability and replicability of the results especially in settings where, for example, not all attributes may be available to them. As a consequence, there is the need to make the outputs of machine learning algorithms more interpretable and to deliver a library of "equivalent" learners (in terms of prediction performance) that users can select based on attribute availability in order to test and/or make use of these learners for predictive/diagnostic purposes. To address these needs, we propose to study a procedure that combines screening and wrapper approaches which, based on a user-specified learning method, greedily explores the attribute space to find a library of sparse learners with consequent low data collection and storage costs. This new method (i) delivers a low-dimensional network of attributes that can be easily interpreted and (ii) increases the potential replicability of results based on the diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm "Sparse Wrapper AlGorithm" (SWAG).