论文标题
带有时间序列数据回归的非线性功能模型
A Non-linear Function-on-Function Model for Regression with Time Series Data
论文作者
论文摘要
在过去的几十年中,为非量表变量建立回归模型,包括时间序列,文本,图像和视频,吸引了来自数据分析社区的研究人员的兴趣。在本文中,我们关注多元时间序列回归问题。具体而言,我们旨在从一定时间间隔内的多个按时间顺序测量的数值变量中学习数学映射到时间间隔T的多个感兴趣的数值变量。前两种模型只能处理定期观察到的时间序列。此外,常规的多元回归模型往往会偏向且效率低下,因为它们无法编码同一时间序列的观测值之间的时间依赖性。顺序学习模型随着时间的推移明确使用相同的参数集,这对准确性产生了负面影响。功能数据分析(统计分支)中的功能在功能上的线性模型不足以捕获所考虑的时间序列之间的复杂相关性,并且容易遇到不足。在本文中,我们提出了一个一般的功能映射,该映射包含功能在功能上的线性模型作为特殊情况。然后,我们使用完全连接的神经网络提出了一个非线性功能在功能模型中,以从数据中学习映射,该模型解决了现有方法中上述问题。对于提出的模型,我们详细描述相应的数值实现过程。提出的模型的有效性通过应用于两个现实世界问题的应用来证明。
In the last few decades, building regression models for non-scalar variables, including time series, text, image, and video, has attracted increasing interests of researchers from the data analytic community. In this paper, we focus on a multivariate time series regression problem. Specifically, we aim to learn mathematical mappings from multiple chronologically measured numerical variables within a certain time interval S to multiple numerical variables of interest over time interval T. Prior arts, including the multivariate regression model, the Seq2Seq model, and the functional linear models, suffer from several limitations. The first two types of models can only handle regularly observed time series. Besides, the conventional multivariate regression models tend to be biased and inefficient, as they are incapable of encoding the temporal dependencies among observations from the same time series. The sequential learning models explicitly use the same set of parameters along time, which has negative impacts on accuracy. The function-on-function linear model in functional data analysis (a branch of statistics) is insufficient to capture complex correlations among the considered time series and suffer from underfitting easily. In this paper, we propose a general functional mapping that embraces the function-on-function linear model as a special case. We then propose a non-linear function-on-function model using the fully connected neural network to learn the mapping from data, which addresses the aforementioned concerns in the existing approaches. For the proposed model, we describe in detail the corresponding numerical implementation procedures. The effectiveness of the proposed model is demonstrated through the application to two real-world problems.