学习删除视频的图表和丢失的数据

论文标题

学习删除视频的图表和丢失的数据

Learning Disentangled Representations of Video with Missing Data

论文作者

Comas-Massagué, Armand, Zhang, Chi, Feric, Zlatan, Camps, Octavia, Yu, Rose

论文摘要

丢失的数据在学习视频序列表示表示时会带来重大挑战。我们提出了分离的估算的视频自动编码器（DIVE），这是一个深层生成模型，在缺少数据的情况下插入并预测未来的视频帧。具体而言，Dive引入了一个缺失的潜在变量，将隐藏的视频表示形式分解为每个对象的静态和动态外观，姿势和缺失因素。潜水渗出每个对象的轨迹，其中丢失了数据。在具有各种缺失方案的移动MNIST数据集中，潜水的表现可以大大优于最高的基线。我们还提供了现实世界中的Motschallenge行人数据集的比较，该数据集在更现实的环境中证明了我们方法的实际价值。我们的代码和数据可以在https://github.com/rose-stl-lab/dive上找到。

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. DIVE imputes each object's trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code and data can be found at https://github.com/Rose-STL-Lab/DIVE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题