使用基于注意的LSTM的有效乌尔都语字幕生成

论文标题

使用基于注意的LSTM的有效乌尔都语字幕生成

Efficient Urdu Caption Generation using Attention based LSTM

论文作者

Ilahi, Inaam, Zia, Hafiz Muhammad Abdullah, Ahsan, Muhammad Ahtazaz, Tabassam, Rauf, Ahmed, Armaghan

论文摘要

深度学习的最新进展创造了许多机会解决现实世界中的问题，这些问题在十多年来一直无法解决。自动字幕生成是一个主要的研究领域，研究界在大多数通用语言（例如英语）上做了很多工作。乌尔都语是巴基斯坦的民族语言，在巴基斯坦 - 印度次大陆地区的口语和理解也是如此，但乌尔都语语言标题的产生尚未做出任何工作。我们的研究旨在通过使用专门针对乌尔都语的序列建模技术开发基于注意力的深度学习模型来填补这一空白。我们通过翻译包含700'Man'图像的“ FlickR8K”数据集的子集来准备了乌尔都语语言的数据集。我们在此数据集上评估了我们提出的技术，并表明它可以在乌尔都语中达到0.83的BLEU分数。我们通过使用更好的CNN体系结构和优化技术来改进以前的最新技术。此外，我们还提供了有关如何正确地将生成的字幕正确地进行语法的讨论。

Recent advancements in deep learning have created many opportunities to solve real-world problems that remained unsolved for more than a decade. Automatic caption generation is a major research field, and the research community has done a lot of work on it in most common languages like English. Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India, and yet no work has been done for Urdu language caption generation. Our research aims to fill this gap by developing an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language. We have prepared a dataset in the Urdu language by translating a subset of the "Flickr8k" dataset containing 700 'man' images. We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language. We improve on the previous state-of-the-art by using better CNN architectures and optimization techniques. Furthermore, we provide a discussion on how the generated captions can be made correct grammar-wise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题