TexThacker：基于学习的混合本地搜索算法，用于文本硬标签对抗攻击

论文标题

TexThacker：基于学习的混合本地搜索算法，用于文本硬标签对抗攻击

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

论文作者

Yu, Zhen, Wang, Xiaosen, Che, Wanxiang, He, Kun

论文摘要

现有的文本对抗攻击通常会利用梯度或预测信心来生成对抗性示例，从而使其很难在现实世界应用程序中部署。为此，我们考虑了很少研究但更严格的环境，即硬标签攻击，攻击者只能访问预测标签。特别是，我们发现我们可以通过对对抗性示例中的单词替换引起的预测标签的更改来了解不同单词的重要性。基于这个观察结果，我们提出了一种新颖的对抗性攻击，称为文本标签攻击者（Texthacker）。 Texthacker随机删除许多单词来制作一个对抗性例子。然后，TexThacker采用了一种混合本地搜索算法，并从攻击历史中估算了单词重要性，以最大程度地减少对抗性扰动。对文本分类和文本构成的广泛评估表明，TexThacker对攻击性能以及对手质量的现有硬标签攻击大大优于现有的硬标签攻击。

Existing textual adversarial attacks usually utilize the gradient or prediction confidence to generate adversarial examples, making it hard to be deployed in real-world applications. To this end, we consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker can only access the prediction label. In particular, we find we can learn the importance of different words via the change on prediction label caused by word substitutions on the adversarial examples. Based on this observation, we propose a novel adversarial attack, termed Text Hard-label attacker (TextHacker). TextHacker randomly perturbs lots of words to craft an adversarial example. Then, TextHacker adopts a hybrid local search algorithm with the estimation of word importance from the attack history to minimize the adversarial perturbation. Extensive evaluations for text classification and textual entailment show that TextHacker significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题