论文标题
毕竟重新连接?神经odes及其数值解决方案
ResNet After All? Neural ODEs and Their Numerical Solution
论文作者
论文摘要
最近提出的神经普通微分方程(ODE)框架的主要吸引力是,它似乎提供了离散残留神经网络的连续时间扩展。但是,正如我们在此显示的那样,训练有素的神经模型实际上取决于训练过程中使用的特定数值方法。如果训练有素的模型应该是从ODE产生的流量,则应该可以选择另一个具有相等或较小数值误差的数值求解器而不会损失性能。我们观察到,如果训练依赖于过度离散过度的求解器,那么与另一个具有相等或较小数值误差的求解器进行测试会导致精度急剧下降。在这种情况下,矢量场和数值方法的组合不能解释为从ODE产生的流,这可以说是神经ode概念的致命分解。但是,我们观察到,存在一个关键的步骤大小,训练会产生有效的ODE矢量场。我们提出了一种在训练过程中监视ode求解器的行为以适应其步骤大小的方法,旨在确保有效的ODE而不会不必要地增加计算成本。我们在常见的基准标记数据集以及合成数据集上验证此适应算法。
A key appeal of the recently proposed Neural Ordinary Differential Equation (ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. As we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. We observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept. We observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost. We verify this adaptation algorithm on a common bench mark dataset as well as a synthetic dataset.