论文标题
在篮子和物品的加法和删除下有效地维护下一个篮子建议
Efficiently Maintaining Next Basket Recommendations under Additions and Deletions of Baskets and Items
论文作者
论文摘要
推荐系统在帮助人们找到信息并在当今日益数字化的社会中做出决策发挥着重要作用。但是,此类机器学习应用程序的广泛采用也引起了数据隐私的关注。这些问题是欧洲最近的“通用数据保护法规”(GDPR)解决的,该法规要求公司在用户执行“被遗忘的权利”时根据要求删除个人用户数据。许多研究人员认为,这种删除义务不仅适用于主要数据存储(例如关系数据库)中存储的数据,而且还需要更新机器学习模型的更新,其培训集包括用于删除的个人数据。我们在称为下一个篮子推荐(NBR)的顺序推荐任务的背景下探索这个方向,该任务是根据用户的购买历史记录推荐一组项目。我们设计有效的算法,用于逐步更新最新的下一个篮子推荐模型,以响应用户篮子和项目的添加和删除。此外,我们讨论了在火花结构化流系统中我们方法的有效,数据并行实现。我们在各种现实世界数据集上评估了我们的实现,在其中我们研究了更新技术对几个排名指标的影响,并衡量执行模型更新的时间。我们的结果表明,我们的方法在增量情况下提供了相对于附加用户篮的恒定更新时间效率,并且在删除现有篮子的减少情况下,线性效率。借助适中的计算资源,无论增量情况下的历史大小如何,我们都能以延迟约为0.2毫秒的延迟更新模型,而在减少情况下则不到一毫秒。
Recommender systems play an important role in helping people find information and make decisions in today's increasingly digitalized societies. However, the wide adoption of such machine learning applications also causes concerns in terms of data privacy. These concerns are addressed by the recent "General Data Protection Regulation" (GDPR) in Europe, which requires companies to delete personal user data upon request when users enforce their "right to be forgotten". Many researchers argue that this deletion obligation does not only apply to the data stored in primary data stores such as relational databases but also requires an update of machine learning models whose training set included the personal data to delete. We explore this direction in the context of a sequential recommendation task called Next Basket Recommendation (NBR), where the goal is to recommend a set of items based on a user's purchase history. We design efficient algorithms for incrementally and decrementally updating a state-of-the-art next basket recommendation model in response to additions and deletions of user baskets and items. Furthermore, we discuss an efficient, data-parallel implementation of our method in the Spark Structured Streaming system. We evaluate our implementation on a variety of real-world datasets, where we investigate the impact of our update techniques on several ranking metrics and measure the time to perform model updates. Our results show that our method provides constant update time efficiency with respect to an additional user basket in the incremental case, and linear efficiency in the decremental case where we delete existing baskets. With modest computational resources, we are able to update models with a latency of around 0.2~milliseconds regardless of the history size in the incremental case, and less than one millisecond in the decremental case.