论文标题
使用深厚的强化学习中的电子商务中的细粒度会话建议
Fine-Grained Session Recommendations in E-commerce using Deep Reinforcement Learning
论文作者
论文摘要
维持用户的兴趣并让他们参与平台对于电子商务业务的成功非常重要。会话包括用户登录平台与登录或进行购买之间的不同活动。会话中的用户活动可以分为两组:已知意图和未知意图。已知的意图活动与会话有关,在该会议上,可以轻松捕获用户浏览/购买特定产品的意图。而在未知的意图活动中,用户的意图尚不清楚。例如,考虑用户进入会话以随便在平台上浏览产品的情况,类似于离线设置中的窗户购物体验。虽然推荐类似产品对于前者至关重要,但准确地了解意图和建议的有趣产品在后一种环境中至关重要,以保留用户。在这项工作中,我们主要关注未知的意图环境,我们的目标是在会话中向用户推荐一系列产品,以维持他们的兴趣,保持他们的参与并可能推动他们购买购买。我们在马尔可夫决策过程(MDP)的框架中提出了这个问题,这是一个流行的数学框架,用于顺序决策,并使用深度强化学习(DRL)技术解决它。但是,由于用户的浏览/购买行为差异很大,培训下一个产品建议在RL范式中很难。因此,我们将问题分解为预测各种产品属性,在该属性中,可以识别和利用模式/趋势以构建准确的模型。我们表明,与贪婪策略相比,DRL代理提供了更好的性能。
Sustaining users' interest and keeping them engaged in the platform is very important for the success of an e-commerce business. A session encompasses different activities of a user between logging into the platform and logging out or making a purchase. User activities in a session can be classified into two groups: Known Intent and Unknown intent. Known intent activity pertains to the session where the intent of a user to browse/purchase a specific product can be easily captured. Whereas in unknown intent activity, the intent of the user is not known. For example, consider the scenario where a user enters the session to casually browse the products over the platform, similar to the window shopping experience in the offline setting. While recommending similar products is essential in the former, accurately understanding the intent and recommending interesting products is essential in the latter setting in order to retain a user. In this work, we focus primarily on the unknown intent setting where our objective is to recommend a sequence of products to a user in a session to sustain their interest, keep them engaged and possibly drive them towards purchase. We formulate this problem in the framework of the Markov Decision Process (MDP), a popular mathematical framework for sequential decision making and solve it using Deep Reinforcement Learning (DRL) techniques. However, training the next product recommendation is difficult in the RL paradigm due to large variance in browse/purchase behavior of the users. Therefore, we break the problem down into predicting various product attributes, where a pattern/trend can be identified and exploited to build accurate models. We show that the DRL agent provides better performance compared to a greedy strategy.