上下文嵌入：什么时候值得？

论文标题

上下文嵌入：什么时候值得？

Contextual Embeddings: When Are They Worth It?

论文作者

Arora, Simran, May, Avner, Zhang, Jian, Ré, Christopher

论文摘要

我们研究了深层上下文嵌入（例如BERT）的设置，相对于经典预读的嵌入（例如，手套）和更简单的基线 - - 随机单词嵌入 - - 专注于训练集大小和任务的语言特性的影响。令人惊讶的是，我们发现这两个简单的基线都可以匹配行业规模数据上的上下文嵌入，并且通常在基准任务上的5％至10％的准确性（绝对）范围内执行。此外，我们确定了上下文嵌入的数据属性，这些数据具有特别巨大的收益：包含复杂结构的语言，模棱两可的单词用法和培训中看不见的单词。

We study the settings for which deep contextual embeddings (e.g., BERT) give large improvements in performance relative to classic pretrained embeddings (e.g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task. Surprisingly, we find that both of these simpler baselines can match contextual embeddings on industry-scale data, and often perform within 5 to 10% accuracy (absolute) on benchmark tasks. Furthermore, we identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题