论文标题
我的数学变压器在做什么? - 解释性和概括的三个结果
What is my math transformer doing? -- Three results on interpretability and generalization
论文作者
论文摘要
本文调查了在矩阵反转和特征值分解上训练的变压器的故障案例和分布外行为。我表明,错误的模型预测仍然保留解决方案的深度数学特性(例如,正确的特征值,特征向量的单位标准),并且几乎所有模型失败都可以归因于问题或解决方案的属性。这表明,当有疑问时,数学变压器不会幻觉荒谬的解决方案(有时提议),而是``大致正确''。我还表明,仔细选择培训数据集可以加速培训,同时允许该模型从训练分布中概括,从而使变形金刚从记忆的示例中``仅仅是插入式''的想法无效。
This paper investigates the failure cases and out-of-distribution behavior of transformers trained on matrix inversion and eigenvalue decomposition. I show that incorrect model predictions still retain deep mathematical properties of the solution (e.g. correct eigenvalues, unit norm of eigenvectors), and that almost all model failures can be attributed to, and predicted from, properties of the problem or solution. This demonstrates that, when in doubt, math transformers do not hallucinate absurd solutions (as was sometimes proposed) but remain ``roughly right''. I also show that the careful choice of a training dataset can accelerate training, while allowing the model to generalize out of its training distribution, invalidating the idea that transformers ``merely interpolate'' from memorized examples.