功能性间接神经估计器，以更好地分布概括

论文标题

功能性间接神经估计器，以更好地分布概括

Functional Indirection Neural Estimator for Better Out-of-distribution Generalization

论文作者

Pham, Kha, Le, Hung, Ngo, Man, Tran, Truyen

论文摘要

实现分布外（OOD）概括的能力是人类智能的标志，但对于机器而言仍然是遥不可及的。这种显着的能力归因于我们进行概念抽象和类比的能力，以及一种称为间接的机制，该机制绑定了两种表示并使用一种表示代表来指代另一种表示。受这些机制的启发，我们假设可以通过在功能空间中进行类比制作和间接来实现OOD的概括，而不是像当前方法一样。为了实现这一目标，我们设计了罚款（功能间接神经估计器），这是一种神经框架，该神经框架学会构成函数，将数据输入映射到即时输出。 Fine由一个骨干网络和可训练的语义记忆组成，对基础重量矩阵。在看到新的输入输出数据对后，Fine通过混合依据来动态构建主链权重。通过使用数据对查询单独的相应语义存储器，可以间接计算混合系数。我们从经验上证明，对涉及几何转换的智商任务可以强烈改善分布的概括。特别是，我们使用来自MNIST，Omniglot和Cifar100数据集的图像在智商任务上进行了良好且竞争的模型，并在一个或不同数据集中看不见的图像类别以及看不见的转换规则进行测试。 Fine不仅可以在所有任务上实现最佳性能，而且还可以适应小规模的数据方案。

The capacity to achieve out-of-distribution (OOD) generalization is a hallmark of human intelligence and yet remains out of reach for machines. This remarkable capability has been attributed to our abilities to make conceptual abstraction and analogy, and to a mechanism known as indirection, which binds two representations and uses one representation to refer to the other. Inspired by these mechanisms, we hypothesize that OOD generalization may be achieved by performing analogy-making and indirection in the functional space instead of the data space as in current methods. To realize this, we design FINE (Functional Indirection Neural Estimator), a neural framework that learns to compose functions that map data input to output on-the-fly. FINE consists of a backbone network and a trainable semantic memory of basis weight matrices. Upon seeing a new input-output data pair, FINE dynamically constructs the backbone weights by mixing the basis weights. The mixing coefficients are indirectly computed through querying a separate corresponding semantic memory using the data pair. We demonstrate empirically that FINE can strongly improve out-of-distribution generalization on IQ tasks that involve geometric transformations. In particular, we train FINE and competing models on IQ tasks using images from the MNIST, Omniglot and CIFAR100 datasets and test on tasks with unseen image classes from one or different datasets and unseen transformation rules. FINE not only achieves the best performance on all tasks but also is able to adapt to small-scale data scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题