数字广告的多语言虚假信息检测

论文标题

数字广告的多语言虚假信息检测

Multilingual Disinformation Detection for Digital Advertising

论文作者

Trstanova, Zofia, Manouzi, Nadir El, Chen, Maryline, da Cunha, Andre L. V., Ivanov, Sergei

论文摘要

在当今世界，在线虚假信息和宣传的存在比以往任何时候都更加普遍。独立出版商主要是通过数字广告资助的，不幸的是，那些发布虚假信息内容的情况也是如此。尽管对开放的互联网产生了负面影响，但如何从广告清单中删除此类发布者的问题一直被忽略。在这项工作中，我们迈出了快速检测和红色标记网站的第一步，这些网站可能会因虚假信息而操纵公众。我们基于多语言文本嵌入的机器学习模型，该模型首先确定页面是否提到了感兴趣的主题，然后估算了内容是恶意的可能性，创建了将由人类专家审查的出版商的入围名单。我们的系统使内部团队有能力主动而不是防御性的黑名单不安全内容，从而保护广告提供商的声誉。

In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.

下载PDF全文

下载文献需遵守相关版权规定

论文标题