论文标题

改善Pinterest的查询安全性

Improving Query Safety at Pinterest

论文作者

Mahabal, Abhijit, Li, Yinrui, Raina, Rajat, Sun, Daniel, Mahajan, Revati, Leskovec, Jure

论文摘要

搜索引擎中的查询建议是一把双边剑,具有不可否认的好处,但可能有害。确定不安全的查询对于保护用户免受不适当的查询建议是必要的。但是,由于大型词汇,社会群体特定的语和错别字所产生的语言多样性,因此确定这些是非平凡的,并且因为术语的不适当性取决于上下文。在这里,我们将问题提出为查询集的扩展,在该问题中,我们有一个较小且可能有偏见的种子集,目的是确定一组各种与语义相关的查询集。我们提出了一个用于查询设定扩展的系统,该系统应用了一种简单而强大的机制来搜索用户会话,将小种子扩展到数千个相关查询中,几乎是完美的精度,深入到尾巴上,以及易于解释的解释。 Pinsets的高质量扩展归功于使用文本和行为技术的混合体(即将查询视为构图和黑匣子)。实验表明,对于与药物相关的查询域的域,两杆将20个种子查询扩展到15,670个阳性训练示例中,以超过99 \%的精度。产生的扩展具有多种词汇量,并以模棱两可的安全性正确处理单词。 PINSET在Pinterest上的不安全查询建议降低了90 \%。

Query recommendations in search engines is a double edged sword, with undeniable benefits but potential of harm. Identifying unsafe queries is necessary to protect users from inappropriate query suggestions. However, identifying these is non-trivial because of the linguistic diversity resulting from large vocabularies, social-group-specific slang and typos, and because the inappropriateness of a term depends on the context. Here we formulate the problem as query-set expansion, where we are given a small and potentially biased seed set and the aim is to identify a diverse set of semantically related queries. We present PinSets, a system for query-set expansion, which applies a simple yet powerful mechanism to search user sessions, expanding a tiny seed set into thousands of related queries at nearly perfect precision, deep into the tail, along with explanations that are easy to interpret. PinSets owes its high quality expansion to using a hybrid of textual and behavioral techniques (i.e., treating queries both as compositional and as black boxes). Experiments show that, for the domain of drugs-related queries, PinSets expands 20 seed queries into 15,670 positive training examples at over 99\% precision. The generated expansions have diverse vocabulary and correctly handles words with ambiguous safety. PinSets decreased unsafe query suggestions at Pinterest by 90\%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源